[jira] [Work logged] (BEAM-4086) KafkaIOTest is flaky

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4086?focusedWorklogId=93081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93081
 ]

ASF GitHub Bot logged work on BEAM-4086:


Author: ASF GitHub Bot
Created on: 20/Apr/18 05:29
Start Date: 20/Apr/18 05:29
Worklog Time Spent: 10m 
  Work Description: rangadi commented on issue #5185: [BEAM-4086]: KafkaIO 
tests:  Avoid busy loop in MockConsumer.poll(), reduce flakes.
URL: https://github.com/apache/beam/pull/5185#issuecomment-382981645
 
 
   +R: @XuMingmin 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93081)
Time Spent: 0.5h  (was: 20m)

> KafkaIOTest is flaky
> 
>
> Key: BEAM-4086
> URL: https://issues.apache.org/jira/browse/BEAM-4086
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka
>Affects Versions: 2.5.0
>Reporter: Ismaël Mejía
>Assignee: Raghu Angadi
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Noticed this while trying to do a simple change in KafkaIO this morning and 
> corroborated with other contributors. If you run `./gradlew -p 
> sdks/java/io/kafka/ clean build` it blocks indefinitely at least 1/3 of the 
> times. However it passes ok with Maven.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3886) Python SDK harness does not contact State API if not needed

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3886?focusedWorklogId=93080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93080
 ]

ASF GitHub Bot logged work on BEAM-3886:


Author: ASF GitHub Bot
Created on: 20/Apr/18 05:24
Start Date: 20/Apr/18 05:24
Worklog Time Spent: 10m 
  Work Description: tweise opened a new pull request #5192: [BEAM-3886] 
Have the Python SDK use the state api service descriptor on the process bundle 
descriptor.
URL: https://github.com/apache/beam/pull/5192
 
 
   This is the commit from @lukecwik that I rebased and tested with the 
portable Flink runner on the hacking-job-server branch. Without this change, 
the SDK cannot communicate with the runner.
   
   CC: @bsidhom @aaltay @robertwb 
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand:
  - [ ] What the pull request does
  - [ ] Why it does it
  - [ ] How it does it
  - [ ] Why this approach
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Run `./gradlew build` to make sure basic checks pass. A more thorough 
check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93080)
Time Spent: 10m
Remaining Estimate: 0h

> Python SDK harness does not contact State API if not needed
> ---
>
> Key: BEAM-3886
> URL: https://issues.apache.org/jira/browse/BEAM-3886
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ben Sidhom
>Assignee: Ahmet Altay
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Python harness always talks to the State API, even if it is never used by 
> the current process bundle. As a minor optimization and to make implementing 
> new runners easier, the harness should not talk to the State server unless 
> it's actually needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4038) Support Kafka Headers in KafkaIO

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4038?focusedWorklogId=93077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93077
 ]

ASF GitHub Bot logged work on BEAM-4038:


Author: ASF GitHub Bot
Created on: 20/Apr/18 05:19
Start Date: 20/Apr/18 05:19
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5111: 
[BEAM-4038] Support Kafka Headers in KafkaIO
URL: https://github.com/apache/beam/pull/5111#discussion_r182947417
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaRecordCoder.java
 ##
 @@ -66,9 +76,36 @@ public void encode(KafkaRecord value, OutputStream 
outStream) throws IOExc
 longCoder.decode(inStream),
 longCoder.decode(inStream),
 KafkaTimestampType.forOrdinal(intCoder.decode(inStream)),
+(Headers) toHeaders(headerCoder.decode(inStream)),
 kvCoder.decode(inStream));
   }
 
+  private Object toHeaders(Iterable> records) {
+// ConsumerRecord is used to simply create a list of headers
+ConsumerRecord consumerRecord = new ConsumerRecord<>("", 
0, 0L, "", "");
+
+if (!ConsumerSpEL.hasHeaders) {
+  return null;
+} else if (!records.iterator().hasNext()) {
+  return consumerRecord.headers();
+}
+
+records.forEach(kv -> consumerRecord.headers().add(kv.getKey(), 
kv.getValue()));
+return consumerRecord.headers();
+  }
+
+  private Iterable> toIterable(KafkaRecord record) {
+if (!ConsumerSpEL.hasHeaders || record.getHeaders() == null){
 
 Review comment:
   record.getHeaders() is never expected to return null (or did you mean to 
check `record.headers == null`?)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93077)
Time Spent: 8h 20m  (was: 8h 10m)

> Support Kafka Headers in KafkaIO
> 
>
> Key: BEAM-4038
> URL: https://issues.apache.org/jira/browse/BEAM-4038
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Geet Kumar
>Assignee: Raghu Angadi
>Priority: Minor
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Headers have been added to Kafka Consumer/Producer records (KAFKA-4208). The 
> purpose of this JIRA is to support this feature in KafkaIO.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4038) Support Kafka Headers in KafkaIO

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4038?focusedWorklogId=93078=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93078
 ]

ASF GitHub Bot logged work on BEAM-4038:


Author: ASF GitHub Bot
Created on: 20/Apr/18 05:19
Start Date: 20/Apr/18 05:19
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5111: 
[BEAM-4038] Support Kafka Headers in KafkaIO
URL: https://github.com/apache/beam/pull/5111#discussion_r182947337
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaRecordCoder.java
 ##
 @@ -66,9 +76,36 @@ public void encode(KafkaRecord value, OutputStream 
outStream) throws IOExc
 longCoder.decode(inStream),
 longCoder.decode(inStream),
 KafkaTimestampType.forOrdinal(intCoder.decode(inStream)),
+(Headers) toHeaders(headerCoder.decode(inStream)),
 kvCoder.decode(inStream));
   }
 
+  private Object toHeaders(Iterable> records) {
+// ConsumerRecord is used to simply create a list of headers
+ConsumerRecord consumerRecord = new ConsumerRecord<>("", 
0, 0L, "", "");
+
+if (!ConsumerSpEL.hasHeaders) {
+  return null;
+} else if (!records.iterator().hasNext()) {
 
 Review comment:
   You can remove else part.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93078)
Time Spent: 8h 20m  (was: 8h 10m)

> Support Kafka Headers in KafkaIO
> 
>
> Key: BEAM-4038
> URL: https://issues.apache.org/jira/browse/BEAM-4038
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Geet Kumar
>Assignee: Raghu Angadi
>Priority: Minor
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Headers have been added to Kafka Consumer/Producer records (KAFKA-4208). The 
> purpose of this JIRA is to support this feature in KafkaIO.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4038) Support Kafka Headers in KafkaIO

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4038?focusedWorklogId=93075=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93075
 ]

ASF GitHub Bot logged work on BEAM-4038:


Author: ASF GitHub Bot
Created on: 20/Apr/18 04:55
Start Date: 20/Apr/18 04:55
Worklog Time Spent: 10m 
  Work Description: gkumar7 commented on a change in pull request #5111: 
[BEAM-4038] Support Kafka Headers in KafkaIO
URL: https://github.com/apache/beam/pull/5111#discussion_r182945790
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java
 ##
 @@ -139,4 +156,11 @@ public long offsetForTime(Consumer consumer, 
TopicPartition topicPartition
   return offsetAndTimestamp.offset();
 }
   }
+
+  public Headers getHeaders(ConsumerRecord rawRecord) {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93075)
Time Spent: 8h 10m  (was: 8h)

> Support Kafka Headers in KafkaIO
> 
>
> Key: BEAM-4038
> URL: https://issues.apache.org/jira/browse/BEAM-4038
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Geet Kumar
>Assignee: Raghu Angadi
>Priority: Minor
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Headers have been added to Kafka Consumer/Producer records (KAFKA-4208). The 
> purpose of this JIRA is to support this feature in KafkaIO.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4044) Take advantage of Calcite DDL

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4044?focusedWorklogId=93068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93068
 ]

ASF GitHub Bot logged work on BEAM-4044:


Author: ASF GitHub Bot
Created on: 20/Apr/18 04:25
Start Date: 20/Apr/18 04:25
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#5154: [BEAM-4044] [SQL] Make BeamCalciteTable self planning
URL: https://github.com/apache/beam/pull/5154#discussion_r182943163
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSinkRel.java
 ##
 @@ -20,61 +20,81 @@
 import com.google.common.base.Joiner;
 import java.util.List;
 import org.apache.beam.sdk.extensions.sql.BeamSqlTable;
-import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv;
+import org.apache.beam.sdk.extensions.sql.impl.rule.BeamIOSinkRule;
 import org.apache.beam.sdk.transforms.PTransform;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PCollectionTuple;
 import org.apache.beam.sdk.values.Row;
 import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptPlanner;
 import org.apache.calcite.plan.RelOptTable;
 import org.apache.calcite.plan.RelTraitSet;
 import org.apache.calcite.prepare.Prepare;
 import org.apache.calcite.rel.RelNode;
 import org.apache.calcite.rel.core.TableModify;
 import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql2rel.RelStructuredTypeFlattener;
 
 /** BeamRelNode to replace a {@code TableModify} node. */
-public class BeamIOSinkRel extends TableModify implements BeamRelNode {
+public class BeamIOSinkRel extends TableModify
+implements BeamRelNode, RelStructuredTypeFlattener.SelfFlatteningRel {
 
-  private final BeamSqlEnv sqlEnv;
+  private final BeamSqlTable sqlTable;
+  private boolean isFlattening = false;
 
   public BeamIOSinkRel(
-  BeamSqlEnv sqlEnv,
   RelOptCluster cluster,
-  RelTraitSet traits,
   RelOptTable table,
   Prepare.CatalogReader catalogReader,
   RelNode child,
   Operation operation,
   List updateColumnList,
   List sourceExpressionList,
-  boolean flattened) {
+  boolean flattened,
+  BeamSqlTable sqlTable) {
 super(
 cluster,
-traits,
+cluster.traitSetOf(BeamLogicalConvention.INSTANCE),
 table,
 catalogReader,
 child,
 operation,
 updateColumnList,
 sourceExpressionList,
 flattened);
-this.sqlEnv = sqlEnv;
+this.sqlTable = sqlTable;
   }
 
   @Override
   public RelNode copy(RelTraitSet traitSet, List inputs) {
-return new BeamIOSinkRel(
-sqlEnv,
+boolean flattened = isFlattened() || isFlattening;
+BeamIOSinkRel newRel = new BeamIOSinkRel(
 getCluster(),
-traitSet,
 getTable(),
 getCatalogReader(),
 sole(inputs),
 getOperation(),
 getUpdateColumnList(),
 getSourceExpressionList(),
-isFlattened());
+flattened,
+sqlTable);
+newRel.traitSet = traitSet;
+return newRel;
+  }
+
+  @Override
+  public void flattenRel(RelStructuredTypeFlattener flattener) {
+// rewriteGeneric calls this.copy. Setting isFlattining passes
+// this context into copy for modificaiton of the flattened flag.
 
 Review comment:
   `modification`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93068)
Time Spent: 6h 10m  (was: 6h)

> Take advantage of Calcite DDL
> -
>
> Key: BEAM-4044
> URL: https://issues.apache.org/jira/browse/BEAM-4044
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In Calcite 1.15 support for abstract DDL moved into calcite core. We should 
> take advantage of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3983) BigQuery writes from pure SQL

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3983?focusedWorklogId=93067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93067
 ]

ASF GitHub Bot logged work on BEAM-3983:


Author: ASF GitHub Bot
Created on: 20/Apr/18 04:23
Start Date: 20/Apr/18 04:23
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #4947: [BEAM-3983] Add 
utils for converting to BigQuery types
URL: https://github.com/apache/beam/pull/4947#issuecomment-382971118
 
 
   Well, now it looks as though you are doomed by the "two executor" problem in 
which we somewhat blatantly use too many resources for our Jenkins workers to 
handle. It looks fine and I think we'll need to solve this immediately or find 
a workaround so people can get things done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93067)
Time Spent: 7h  (was: 6h 50m)

> BigQuery writes from pure SQL
> -
>
> Key: BEAM-3983
> URL: https://issues.apache.org/jira/browse/BEAM-3983
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> It would be nice if you could write to BigQuery in SQL without writing any 
> java code. For example:
> {code:java}
> INSERT INTO bigquery SELECT * FROM PCOLLECTION{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4038) Support Kafka Headers in KafkaIO

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4038?focusedWorklogId=93064=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93064
 ]

ASF GitHub Bot logged work on BEAM-4038:


Author: ASF GitHub Bot
Created on: 20/Apr/18 04:14
Start Date: 20/Apr/18 04:14
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5111: 
[BEAM-4038] Support Kafka Headers in KafkaIO
URL: https://github.com/apache/beam/pull/5111#discussion_r182942212
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java
 ##
 @@ -139,4 +156,11 @@ public long offsetForTime(Consumer consumer, 
TopicPartition topicPartition
   return offsetAndTimestamp.offset();
 }
   }
+
+  public Headers getHeaders(ConsumerRecord rawRecord) {
 
 Review comment:
   Yes, that's that what I was thinking. 'hasHeaders' already used there.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93064)
Time Spent: 8h  (was: 7h 50m)

> Support Kafka Headers in KafkaIO
> 
>
> Key: BEAM-4038
> URL: https://issues.apache.org/jira/browse/BEAM-4038
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Geet Kumar
>Assignee: Raghu Angadi
>Priority: Minor
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Headers have been added to Kafka Consumer/Producer records (KAFKA-4208). The 
> purpose of this JIRA is to support this feature in KafkaIO.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4135) Remove Use of Java SDK Types in the DirectRunner "engine"

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4135?focusedWorklogId=93061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93061
 ]

ASF GitHub Bot logged work on BEAM-4135:


Author: ASF GitHub Bot
Created on: 20/Apr/18 03:56
Start Date: 20/Apr/18 03:56
Worklog Time Spent: 10m 
  Work Description: tgroh closed pull request #5177: [BEAM-4135] Stop 
taking the whole result in WatermarkManager
URL: https://github.com/apache/beam/pull/5177
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/EvaluationContext.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/EvaluationContext.java
index 8f0dd423125..bfa65cd2d8e 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/EvaluationContext.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/EvaluationContext.java
@@ -31,7 +31,6 @@
 import java.util.Set;
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.ConcurrentMap;
-import javax.annotation.Nullable;
 import org.apache.beam.runners.core.ReadyCheckingSideInputReader;
 import org.apache.beam.runners.core.SideInputReader;
 import org.apache.beam.runners.core.TimerInternals.TimerData;
@@ -141,7 +140,7 @@ public void initialize(
* @return the committed bundles contained within the handled {@code result}
*/
   public CommittedResult handleResult(
-  @Nullable CommittedBundle completedBundle,
+  CommittedBundle completedBundle,
   Iterable completedTimers,
   TransformResult result) {
 Iterable> committedBundles =
@@ -162,9 +161,7 @@ public void initialize(
 CopyOnAccessInMemoryStateInternals theirState = result.getState();
 if (theirState != null) {
   CopyOnAccessInMemoryStateInternals committedState = theirState.commit();
-  StepAndKey stepAndKey =
-  StepAndKey.of(
-  result.getTransform(), completedBundle == null ? null : 
completedBundle.getKey());
+  StepAndKey stepAndKey = StepAndKey.of(result.getTransform(), 
completedBundle.getKey());
   if (!committedState.isEmpty()) {
 applicationStateInternals.put(stepAndKey, committedState);
   } else {
@@ -176,7 +173,9 @@ public void initialize(
 watermarkManager.updateWatermarks(
 completedBundle,
 result.getTimerUpdate().withCompletedTimers(completedTimers),
-committedResult,
+committedResult.getExecutable(),
+committedResult.getUnprocessedInputs().orNull(),
+committedResult.getOutputs(),
 result.getWatermarkHold());
 return committedResult;
   }
@@ -188,7 +187,7 @@ public void initialize(
* {@link Optional}.
*/
   private Optional> getUnprocessedInput(
-  @Nullable CommittedBundle completedBundle, TransformResult result) 
{
+  CommittedBundle completedBundle, TransformResult result) {
 if (completedBundle == null || 
Iterables.isEmpty(result.getUnprocessedElements())) {
   return Optional.absent();
 }
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/QuiescenceDriver.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/QuiescenceDriver.java
index 9ada00e5c8f..e3632697164 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/QuiescenceDriver.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/QuiescenceDriver.java
@@ -162,12 +162,12 @@ private void fireTimers() {
 transformTimers.getKey(),
 (PCollection)
 Iterables.getOnlyElement(
-
transformTimers.getTransform().getInputs().values()))
+
transformTimers.getExecutable().getInputs().values()))
 .add(WindowedValue.valueInGlobalWindow(work))
 .commit(evaluationContext.now());
 outstandingWork.incrementAndGet();
 bundleProcessor.process(
-bundle, transformTimers.getTransform(), new 
TimerIterableCompletionCallback(delivery));
+bundle, transformTimers.getExecutable(), new 
TimerIterableCompletionCallback(delivery));
 state.set(ExecutorState.ACTIVE);
   }
 } catch (Exception e) {
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/WatermarkManager.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/WatermarkManager.java
index 882bdc5cbc9..86e904655ce 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/WatermarkManager.java
+++ 

[beam] 01/01: Merge pull request #5177: Stop taking the whole result in WatermarkManager

2018-04-19 Thread tgroh
This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 3eacf8b8d7301d1bd4db17874a31caf3b9949526
Merge: 0e87ba1 f0c5d4c
Author: Thomas Groh 
AuthorDate: Thu Apr 19 20:56:17 2018 -0700

Merge pull request #5177: Stop taking the whole result in WatermarkManager

[BEAM-4135]

 .../beam/runners/direct/EvaluationContext.java |  13 +-
 .../beam/runners/direct/QuiescenceDriver.java  |   4 +-
 .../beam/runners/direct/WatermarkManager.java  | 148 ---
 .../beam/runners/direct/WatermarkManagerTest.java  | 481 ++---
 4 files changed, 312 insertions(+), 334 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
tg...@apache.org.


[beam] branch master updated (0e87ba1 -> 3eacf8b)

2018-04-19 Thread tgroh
This is an automated email from the ASF dual-hosted git repository.

tgroh pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 0e87ba1  Merge pull request #5190: Add pubsub to Go maven build
 add 291745c  Remove Unused WatermarkManager Method
 add f0c5d4c  Stop taking the whole result in WatermarkManager
 new 3eacf8b  Merge pull request #5177: Stop taking the whole result in 
WatermarkManager

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../beam/runners/direct/EvaluationContext.java |  13 +-
 .../beam/runners/direct/QuiescenceDriver.java  |   4 +-
 .../beam/runners/direct/WatermarkManager.java  | 148 ---
 .../beam/runners/direct/WatermarkManagerTest.java  | 481 ++---
 4 files changed, 312 insertions(+), 334 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
tg...@apache.org.


[jira] [Work logged] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3773?focusedWorklogId=93057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93057
 ]

ASF GitHub Bot logged work on BEAM-3773:


Author: ASF GitHub Bot
Created on: 20/Apr/18 03:41
Start Date: 20/Apr/18 03:41
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#5173: [BEAM-3773][SQL] Add EnumerableConverter for JDBC support
URL: https://github.com/apache/beam/pull/5173#discussion_r182939167
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rel;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.MetricNameFilter;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.metrics.MetricsFilter;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.Row;
+import org.apache.calcite.adapter.enumerable.EnumerableRel;
+import org.apache.calcite.adapter.enumerable.EnumerableRelImplementor;
+import org.apache.calcite.adapter.enumerable.PhysType;
+import org.apache.calcite.adapter.enumerable.PhysTypeImpl;
+import org.apache.calcite.linq4j.Enumerable;
+import org.apache.calcite.linq4j.Linq4j;
+import org.apache.calcite.linq4j.tree.BlockBuilder;
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.plan.ConventionTraitDef;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.convert.ConverterImpl;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataType;
+
+/**
+ * BeamRelNode to replace a {@code Enumerable} node.
+ */
+public class BeamEnumerableConverter extends ConverterImpl implements 
EnumerableRel {
+
+  public BeamEnumerableConverter(RelOptCluster cluster, RelTraitSet traits, 
RelNode input) {
+super(cluster, ConventionTraitDef.INSTANCE, traits, input);
+  }
+
+  @Override
+  public RelNode copy(RelTraitSet traitSet, List inputs) {
+return new BeamEnumerableConverter(getCluster(), traitSet, sole(inputs));
+  }
+
+  @Override
+  public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery 
mq) {
+// This should always be a last resort.
+return planner.getCostFactory().makeHugeCost();
+  }
+
+  @Override
+  public Result implement(EnumerableRelImplementor implementor, Prefer prefer) 
{
+final BlockBuilder list = new BlockBuilder();
+final RelDataType rowType = getRowType();
+final PhysType physType =
+PhysTypeImpl.of(implementor.getTypeFactory(), rowType, 
prefer.preferArray());
+final Expression node = implementor.stash((BeamRelNode) getInput(), 
BeamRelNode.class);
+list.add(Expressions.call(BeamEnumerableConverter.class, "toEnumerable", 
node));
+return implementor.result(physType, list.toBlock());
+  }
+
+  public static Enumerable toEnumerable(BeamRelNode node) {
+PipelineOptions options = PipelineOptionsFactory.create();
+if (node instanceof BeamIOSinkRel) {
+  return count(options, node);
+}
+return collect(options, node);
+  }
+
+  private static PipelineResult run(PipelineOptions options, BeamRelNode node,
+  DoFn doFn) {
+

[jira] [Work logged] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3773?focusedWorklogId=93058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93058
 ]

ASF GitHub Bot logged work on BEAM-3773:


Author: ASF GitHub Bot
Created on: 20/Apr/18 03:41
Start Date: 20/Apr/18 03:41
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#5173: [BEAM-3773][SQL] Add EnumerableConverter for JDBC support
URL: https://github.com/apache/beam/pull/5173#discussion_r182938987
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java
 ##
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rel;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.MetricNameFilter;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.metrics.MetricsFilter;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.Row;
+import org.apache.calcite.adapter.enumerable.EnumerableRel;
+import org.apache.calcite.adapter.enumerable.EnumerableRelImplementor;
+import org.apache.calcite.adapter.enumerable.PhysType;
+import org.apache.calcite.adapter.enumerable.PhysTypeImpl;
+import org.apache.calcite.linq4j.Enumerable;
+import org.apache.calcite.linq4j.Linq4j;
+import org.apache.calcite.linq4j.tree.BlockBuilder;
+import org.apache.calcite.linq4j.tree.Expression;
+import org.apache.calcite.linq4j.tree.Expressions;
+import org.apache.calcite.plan.ConventionTraitDef;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelOptCost;
+import org.apache.calcite.plan.RelOptPlanner;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.convert.ConverterImpl;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataType;
+
+/**
+ * BeamRelNode to replace a {@code Enumerable} node.
+ */
+public class BeamEnumerableConverter extends ConverterImpl implements 
EnumerableRel {
+
+  public BeamEnumerableConverter(RelOptCluster cluster, RelTraitSet traits, 
RelNode input) {
+super(cluster, ConventionTraitDef.INSTANCE, traits, input);
+  }
+
+  @Override
+  public RelNode copy(RelTraitSet traitSet, List inputs) {
+return new BeamEnumerableConverter(getCluster(), traitSet, sole(inputs));
+  }
+
+  @Override
+  public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery 
mq) {
+// This should always be a last resort.
+return planner.getCostFactory().makeHugeCost();
+  }
+
+  @Override
+  public Result implement(EnumerableRelImplementor implementor, Prefer prefer) 
{
+final BlockBuilder list = new BlockBuilder();
+final RelDataType rowType = getRowType();
+final PhysType physType =
+PhysTypeImpl.of(implementor.getTypeFactory(), rowType, 
prefer.preferArray());
+final Expression node = implementor.stash((BeamRelNode) getInput(), 
BeamRelNode.class);
+list.add(Expressions.call(BeamEnumerableConverter.class, "toEnumerable", 
node));
+return implementor.result(physType, list.toBlock());
+  }
+
+  public static Enumerable toEnumerable(BeamRelNode node) {
+PipelineOptions options = PipelineOptionsFactory.create();
 
 Review comment:
   Just checking - since `Collector` only works on the `DirectRunner` it seems 
fine to hardcode it here. But are the options specified elsewhere when using 
SQL so here it would just be validation that the configuration is 

Jenkins build is back to normal : beam_PostCommit_Java_GradleBuild #130

2018-04-19 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Apex_Gradle #152

2018-04-19 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_GradleBuild #127

2018-04-19 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=93037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93037
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 20/Apr/18 02:09
Start Date: 20/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r182923874
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ControlClientPool.java
 ##
 @@ -17,16 +17,31 @@
  */
 package org.apache.beam.runners.fnexecution.control;
 
-import org.apache.beam.sdk.fn.function.ThrowingConsumer;
-import org.apache.beam.sdk.util.ThrowingSupplier;
+/**
+ * A pool of control clients that brokers incoming SDK harness connections (in 
the form of {@link
+ * InstructionRequestHandler InstructionRequestHandlers}.
+ *
+ * Incoming instruction handlers usually come from the control plane gRPC 
service. Typical use:
+ *
+ * 
+ *   // Within owner of the pool, who may or may not own the control plane 
server as well
+ *   ControlClientPool pool = ...
+ *   FnApiControlClientPoolService service =
+ *   FnApiControlClientPoolService.offeringClientsToSink(pool.getSink(), 
headerAccessor)
+ *   // Incoming gRPC control connections will now be added to the client pool.
+ *
+ *   // Within code that interacts with the instruction handler. The get call 
blocks until an
+ *   // incoming client is available:
+ *   ControlClientSource clientSource = ... InstructionRequestHandler
+ *   instructionHandler = clientSource.get("worker-id");
+ * 
+ */
+public interface ControlClientPool {
 
-/** Control client pool that exposes a source and sink of control clients. */
-public interface ControlClientPool {
+  /** Sink for control clients. */
+  ControlClientSink getSink();
 
 Review comment:
   Can we change 
   getSink => getSinkPool
   getSource => getSourcePool
   
   better yet, we can, make 
   `
   Function getSinkPool();
   Function 
getSourcePool();
   `


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93037)
Time Spent: 12h 10m  (was: 12h)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=93036=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93036
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 20/Apr/18 02:09
Start Date: 20/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r182927203
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ControlClientSink.java
 ##
 @@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+/** A sink for {@link InstructionRequestHandler InstructionRequestHandlers} 
keyed by worker id. */
+@FunctionalInterface
+public interface ControlClientSink {
 
 Review comment:
   ControlClientSink => ControlClientSinkPool
   and if it is a pool then we can provide 
ThrowingConsumer in get method.
   Usage will be like:
   `
ThrowingConsumer sink = 
controlClientSink.get(workerId);
   `
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93036)
Time Spent: 12h  (was: 11h 50m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=93033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93033
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 20/Apr/18 02:09
Start Date: 20/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5175: [BEAM-4018] Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#discussion_r182930323
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
 ##
 @@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.MoreObjects;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.io.range.ByteKey;
+import org.apache.beam.sdk.io.range.ByteKeyRange;
+import org.apache.beam.sdk.transforms.DoFn;
+
+/**
+ * A {@link RestrictionTracker} for claiming {@link ByteKey}s in a {@link 
ByteKeyRange} in a
+ * monotonically increasing fashion.
+ */
+public class ByteKeyRangeTracker extends RestrictionTracker {
+  private ByteKeyRange range;
+  @Nullable private ByteKey lastClaimedKey = null;
+  @Nullable private ByteKey lastAttemptedKey = null;
+
+  private ByteKeyRangeTracker(ByteKeyRange range) {
+this.range = checkNotNull(range);
+  }
+
+  /**
+   * Instantiates a new {@link ByteKeyRangeTracker} with the specified range. 
The keys in the range
+   * are left padded to be the same length in bytes.
+   */
+  public static ByteKeyRangeTracker of(ByteKeyRange range) {
+return new ByteKeyRangeTracker(ByteKeyRange.of(range.getStartKey(), 
range.getEndKey()));
+  }
+
+  @Override
+  public synchronized ByteKeyRange currentRestriction() {
+return range;
+  }
+
+  @Override
+  public synchronized ByteKeyRange checkpoint() {
+checkState(lastClaimedKey != null, "Can't checkpoint before any successful 
claim");
+final ByteKey nextKey = next(lastClaimedKey, range);
+// hack to force range to be bigger than the range upper bundle because 
ByteKeyRange *start*
+// must always be less than *end*.
+ByteKey rangeEndKey =
+(nextKey.equals(range.getEndKey())) ? next(range.getEndKey(), range) : 
range.getEndKey();
+ByteKeyRange res = ByteKeyRange.of(nextKey, rangeEndKey);
+this.range = ByteKeyRange.of(range.getStartKey(), nextKey);
+return res;
+  }
+
+  /**
+   * Attempts to claim the given key.
+   *
+   * Must be larger than the last successfully claimed key.
+   *
+   * @return {@code true} if the key was successfully claimed, {@code false} 
if it is outside the
+   * current {@link ByteKeyRange} of this tracker (in that case this 
operation is a no-op).
+   */
+  @Override
+  protected synchronized boolean tryClaimImpl(ByteKey key) {
+checkArgument(
+lastAttemptedKey == null || key.compareTo(lastAttemptedKey) > 0,
+"Trying to claim key %s while last attempted was %s",
+key,
+lastAttemptedKey);
+checkArgument(
+key.compareTo(range.getStartKey()) > -1,
+"Trying to claim key %s before start of the range %s",
+key,
+range);
+lastAttemptedKey = key;
+// No respective checkArgument for i < range.to() - it's ok to try 
claiming keys beyond
+if (!range.getEndKey().isEmpty() && key.compareTo(range.getEndKey()) > -1) 
{
+  return false;
+}
+lastClaimedKey = key;
+return true;
+  }
+
+  /**
+   * Marks that there are no more keys to be claimed in the range.
+   *
+   * E.g., a {@link DoFn} reading a file and claiming the key of each 
record in the file might
+   * call this if it hits EOF - even though the last attempted claim was 
before the end of the
+   * 

[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=93032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93032
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 20/Apr/18 02:09
Start Date: 20/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5175: [BEAM-4018] Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#discussion_r182923302
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
 ##
 @@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.MoreObjects;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.io.range.ByteKey;
+import org.apache.beam.sdk.io.range.ByteKeyRange;
+import org.apache.beam.sdk.transforms.DoFn;
+
+/**
+ * A {@link RestrictionTracker} for claiming {@link ByteKey}s in a {@link 
ByteKeyRange} in a
+ * monotonically increasing fashion.
+ */
+public class ByteKeyRangeTracker extends RestrictionTracker {
+  private ByteKeyRange range;
+  @Nullable private ByteKey lastClaimedKey = null;
+  @Nullable private ByteKey lastAttemptedKey = null;
+
+  private ByteKeyRangeTracker(ByteKeyRange range) {
+this.range = checkNotNull(range);
+  }
+
+  /**
+   * Instantiates a new {@link ByteKeyRangeTracker} with the specified range. 
The keys in the range
+   * are left padded to be the same length in bytes.
+   */
+  public static ByteKeyRangeTracker of(ByteKeyRange range) {
+return new ByteKeyRangeTracker(ByteKeyRange.of(range.getStartKey(), 
range.getEndKey()));
+  }
+
+  @Override
+  public synchronized ByteKeyRange currentRestriction() {
+return range;
+  }
+
+  @Override
+  public synchronized ByteKeyRange checkpoint() {
+checkState(lastClaimedKey != null, "Can't checkpoint before any successful 
claim");
+final ByteKey nextKey = next(lastClaimedKey, range);
+// hack to force range to be bigger than the range upper bundle because 
ByteKeyRange *start*
+// must always be less than *end*.
+ByteKey rangeEndKey =
+(nextKey.equals(range.getEndKey())) ? next(range.getEndKey(), range) : 
range.getEndKey();
+ByteKeyRange res = ByteKeyRange.of(nextKey, rangeEndKey);
+this.range = ByteKeyRange.of(range.getStartKey(), nextKey);
 
 Review comment:
   Assert that nextKey <= range.GetEndKey() ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93032)

> Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
> -
>
> Key: BEAM-4018
> URL: https://issues.apache.org/jira/browse/BEAM-4018
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We can have a RestrictionTracker for ByteKey ranges as part of the core sdk 
> so it can be reused by future SDF based IOs like Bigtable, HBase among others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=93031=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93031
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 20/Apr/18 02:09
Start Date: 20/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5175: [BEAM-4018] Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#discussion_r182921793
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
 ##
 @@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.MoreObjects;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.io.range.ByteKey;
+import org.apache.beam.sdk.io.range.ByteKeyRange;
+import org.apache.beam.sdk.transforms.DoFn;
+
+/**
+ * A {@link RestrictionTracker} for claiming {@link ByteKey}s in a {@link 
ByteKeyRange} in a
+ * monotonically increasing fashion.
+ */
+public class ByteKeyRangeTracker extends RestrictionTracker {
+  private ByteKeyRange range;
+  @Nullable private ByteKey lastClaimedKey = null;
+  @Nullable private ByteKey lastAttemptedKey = null;
+
+  private ByteKeyRangeTracker(ByteKeyRange range) {
+this.range = checkNotNull(range);
+  }
+
+  /**
+   * Instantiates a new {@link ByteKeyRangeTracker} with the specified range. 
The keys in the range
+   * are left padded to be the same length in bytes.
+   */
+  public static ByteKeyRangeTracker of(ByteKeyRange range) {
+return new ByteKeyRangeTracker(ByteKeyRange.of(range.getStartKey(), 
range.getEndKey()));
+  }
+
+  @Override
+  public synchronized ByteKeyRange currentRestriction() {
+return range;
+  }
+
+  @Override
+  public synchronized ByteKeyRange checkpoint() {
+checkState(lastClaimedKey != null, "Can't checkpoint before any successful 
claim");
+final ByteKey nextKey = next(lastClaimedKey, range);
+// hack to force range to be bigger than the range upper bundle because 
ByteKeyRange *start*
+// must always be less than *end*.
+ByteKey rangeEndKey =
 
 Review comment:
   This looks incorrect. Previously range.getEndKey() was not supposed to be a 
part of the residual. Now it is. I think you should just create a range where 
start == end end let the caller figure it out.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93031)
Time Spent: 50m  (was: 40m)

> Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
> -
>
> Key: BEAM-4018
> URL: https://issues.apache.org/jira/browse/BEAM-4018
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We can have a RestrictionTracker for ByteKey ranges as part of the core sdk 
> so it can be reused by future SDF based IOs like Bigtable, HBase among others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=93040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93040
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 20/Apr/18 02:09
Start Date: 20/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r182928850
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Arrays;
+import java.util.List;
+import java.util.function.Supplier;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientSource;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+
+/**
+ * An {@link EnvironmentFactory} that creates docker containers by shelling 
out to docker. Returned
+ * {@link RemoteEnvironment RemoteEnvironments} own their respective docker 
containers. Not
+ * thread-safe.
+ */
+public class DockerEnvironmentFactory implements EnvironmentFactory {
+
+  public static DockerEnvironmentFactory forServices(
+  DockerWrapper docker,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientSource clientSource,
+  // TODO: Refine this to IdGenerator when we determine where that should 
live.
+  Supplier idGenerator) {
+return new DockerEnvironmentFactory(
+docker,
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+idGenerator,
+clientSource);
+  }
+
+  private final DockerWrapper docker;
+  private final GrpcFnServer 
controlServiceServer;
+  private final GrpcFnServer loggingServiceServer;
+  private final GrpcFnServer retrievalServiceServer;
+  private final GrpcFnServer 
provisioningServiceServer;
+  private final Supplier idGenerator;
+  private final ControlClientSource clientSource;
+
+  private DockerEnvironmentFactory(
+  DockerWrapper docker,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  Supplier idGenerator,
+  ControlClientSource clientSource) {
+this.docker = docker;
+this.controlServiceServer = controlServiceServer;
+this.loggingServiceServer = loggingServiceServer;
+this.retrievalServiceServer = retrievalServiceServer;
+this.provisioningServiceServer = provisioningServiceServer;
+this.idGenerator = idGenerator;
+this.clientSource = clientSource;
+  }
+
+  /** Creates a new, active {@link RemoteEnvironment} backed by a local Docker 
container. */
+  @Override
+  public RemoteEnvironment createEnvironment(Environment environment) throws 
Exception {
+String workerId = idGenerator.get();
+
+// Prepare docker invocation.
+Path workerPersistentDirectory = 
Files.createTempDirectory("worker_persistent_directory");
+Path semiPersistentDirectory = 
Files.createTempDirectory("semi_persistent_dir");
+String containerImage = environment.getUrl();
+// TODO: https://issues.apache.org/jira/browse/BEAM-4148 The default 
service address will not
+// work for Docker for Mac.
+ 

[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=93030=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93030
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 20/Apr/18 02:09
Start Date: 20/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5175: [BEAM-4018] Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#discussion_r182929585
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
 ##
 @@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.MoreObjects;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.io.range.ByteKey;
+import org.apache.beam.sdk.io.range.ByteKeyRange;
+import org.apache.beam.sdk.transforms.DoFn;
+
+/**
+ * A {@link RestrictionTracker} for claiming {@link ByteKey}s in a {@link 
ByteKeyRange} in a
+ * monotonically increasing fashion.
+ */
+public class ByteKeyRangeTracker extends RestrictionTracker {
+  private ByteKeyRange range;
+  @Nullable private ByteKey lastClaimedKey = null;
+  @Nullable private ByteKey lastAttemptedKey = null;
+
+  private ByteKeyRangeTracker(ByteKeyRange range) {
+this.range = checkNotNull(range);
+  }
+
+  /**
+   * Instantiates a new {@link ByteKeyRangeTracker} with the specified range. 
The keys in the range
+   * are left padded to be the same length in bytes.
+   */
+  public static ByteKeyRangeTracker of(ByteKeyRange range) {
+return new ByteKeyRangeTracker(ByteKeyRange.of(range.getStartKey(), 
range.getEndKey()));
+  }
+
+  @Override
+  public synchronized ByteKeyRange currentRestriction() {
+return range;
+  }
+
+  @Override
+  public synchronized ByteKeyRange checkpoint() {
+checkState(lastClaimedKey != null, "Can't checkpoint before any successful 
claim");
+final ByteKey nextKey = next(lastClaimedKey, range);
+// hack to force range to be bigger than the range upper bundle because 
ByteKeyRange *start*
+// must always be less than *end*.
+ByteKey rangeEndKey =
+(nextKey.equals(range.getEndKey())) ? next(range.getEndKey(), range) : 
range.getEndKey();
+ByteKeyRange res = ByteKeyRange.of(nextKey, rangeEndKey);
+this.range = ByteKeyRange.of(range.getStartKey(), nextKey);
+return res;
+  }
+
+  /**
+   * Attempts to claim the given key.
+   *
+   * Must be larger than the last successfully claimed key.
+   *
+   * @return {@code true} if the key was successfully claimed, {@code false} 
if it is outside the
+   * current {@link ByteKeyRange} of this tracker (in that case this 
operation is a no-op).
+   */
+  @Override
+  protected synchronized boolean tryClaimImpl(ByteKey key) {
+checkArgument(
+lastAttemptedKey == null || key.compareTo(lastAttemptedKey) > 0,
+"Trying to claim key %s while last attempted was %s",
+key,
+lastAttemptedKey);
+checkArgument(
+key.compareTo(range.getStartKey()) > -1,
+"Trying to claim key %s before start of the range %s",
+key,
+range);
+lastAttemptedKey = key;
+// No respective checkArgument for i < range.to() - it's ok to try 
claiming keys beyond
+if (!range.getEndKey().isEmpty() && key.compareTo(range.getEndKey()) > -1) 
{
+  return false;
+}
+lastClaimedKey = key;
+return true;
+  }
+
+  /**
+   * Marks that there are no more keys to be claimed in the range.
+   *
+   * E.g., a {@link DoFn} reading a file and claiming the key of each 
record in the file might
+   * call this if it hits EOF - even though the last attempted claim was 
before the end of the
+   * 

[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=93034=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93034
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 20/Apr/18 02:09
Start Date: 20/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5175: [BEAM-4018] Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#discussion_r182929659
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
 ##
 @@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.MoreObjects;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.io.range.ByteKey;
+import org.apache.beam.sdk.io.range.ByteKeyRange;
+import org.apache.beam.sdk.transforms.DoFn;
+
+/**
+ * A {@link RestrictionTracker} for claiming {@link ByteKey}s in a {@link 
ByteKeyRange} in a
+ * monotonically increasing fashion.
+ */
+public class ByteKeyRangeTracker extends RestrictionTracker {
+  private ByteKeyRange range;
+  @Nullable private ByteKey lastClaimedKey = null;
+  @Nullable private ByteKey lastAttemptedKey = null;
+
+  private ByteKeyRangeTracker(ByteKeyRange range) {
+this.range = checkNotNull(range);
+  }
+
+  /**
+   * Instantiates a new {@link ByteKeyRangeTracker} with the specified range. 
The keys in the range
+   * are left padded to be the same length in bytes.
+   */
+  public static ByteKeyRangeTracker of(ByteKeyRange range) {
+return new ByteKeyRangeTracker(ByteKeyRange.of(range.getStartKey(), 
range.getEndKey()));
+  }
+
+  @Override
+  public synchronized ByteKeyRange currentRestriction() {
+return range;
+  }
+
+  @Override
+  public synchronized ByteKeyRange checkpoint() {
+checkState(lastClaimedKey != null, "Can't checkpoint before any successful 
claim");
+final ByteKey nextKey = next(lastClaimedKey, range);
+// hack to force range to be bigger than the range upper bundle because 
ByteKeyRange *start*
+// must always be less than *end*.
+ByteKey rangeEndKey =
+(nextKey.equals(range.getEndKey())) ? next(range.getEndKey(), range) : 
range.getEndKey();
+ByteKeyRange res = ByteKeyRange.of(nextKey, rangeEndKey);
+this.range = ByteKeyRange.of(range.getStartKey(), nextKey);
+return res;
+  }
+
+  /**
+   * Attempts to claim the given key.
+   *
+   * Must be larger than the last successfully claimed key.
+   *
+   * @return {@code true} if the key was successfully claimed, {@code false} 
if it is outside the
+   * current {@link ByteKeyRange} of this tracker (in that case this 
operation is a no-op).
+   */
+  @Override
+  protected synchronized boolean tryClaimImpl(ByteKey key) {
+checkArgument(
+lastAttemptedKey == null || key.compareTo(lastAttemptedKey) > 0,
+"Trying to claim key %s while last attempted was %s",
+key,
+lastAttemptedKey);
+checkArgument(
+key.compareTo(range.getStartKey()) > -1,
+"Trying to claim key %s before start of the range %s",
+key,
+range);
+lastAttemptedKey = key;
+// No respective checkArgument for i < range.to() - it's ok to try 
claiming keys beyond
+if (!range.getEndKey().isEmpty() && key.compareTo(range.getEndKey()) > -1) 
{
+  return false;
+}
+lastClaimedKey = key;
+return true;
+  }
+
+  /**
+   * Marks that there are no more keys to be claimed in the range.
+   *
+   * E.g., a {@link DoFn} reading a file and claiming the key of each 
record in the file might
+   * call this if it hits EOF - even though the last attempted claim was 
before the end of the
+   * 

[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=93039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93039
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 20/Apr/18 02:09
Start Date: 20/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r182928389
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientPoolService.java
 ##
 @@ -79,7 +76,7 @@ public static FnApiControlClientPoolService 
offeringClientsToPool(
   // discarded, which should be performed by a call to #shutdownNow. The 
remote caller must be
   // able to handle an unexpectedly terminated connection.
   vendedClients.add(newClient);
 
 Review comment:
   Incoming client connections are not serialized and we can get multiple 
client calls in parallel.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93039)
Time Spent: 12.5h  (was: 12h 20m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=93035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93035
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 20/Apr/18 02:09
Start Date: 20/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5175: [BEAM-4018] Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#discussion_r182924622
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.java
 ##
 @@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.MoreObjects;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.io.range.ByteKey;
+import org.apache.beam.sdk.io.range.ByteKeyRange;
+import org.apache.beam.sdk.transforms.DoFn;
+
+/**
+ * A {@link RestrictionTracker} for claiming {@link ByteKey}s in a {@link 
ByteKeyRange} in a
+ * monotonically increasing fashion.
+ */
+public class ByteKeyRangeTracker extends RestrictionTracker {
+  private ByteKeyRange range;
+  @Nullable private ByteKey lastClaimedKey = null;
+  @Nullable private ByteKey lastAttemptedKey = null;
+
+  private ByteKeyRangeTracker(ByteKeyRange range) {
+this.range = checkNotNull(range);
+  }
+
+  /**
+   * Instantiates a new {@link ByteKeyRangeTracker} with the specified range. 
The keys in the range
+   * are left padded to be the same length in bytes.
+   */
+  public static ByteKeyRangeTracker of(ByteKeyRange range) {
+return new ByteKeyRangeTracker(ByteKeyRange.of(range.getStartKey(), 
range.getEndKey()));
+  }
+
+  @Override
+  public synchronized ByteKeyRange currentRestriction() {
+return range;
+  }
+
+  @Override
+  public synchronized ByteKeyRange checkpoint() {
+checkState(lastClaimedKey != null, "Can't checkpoint before any successful 
claim");
+final ByteKey nextKey = next(lastClaimedKey, range);
+// hack to force range to be bigger than the range upper bundle because 
ByteKeyRange *start*
+// must always be less than *end*.
+ByteKey rangeEndKey =
+(nextKey.equals(range.getEndKey())) ? next(range.getEndKey(), range) : 
range.getEndKey();
+ByteKeyRange res = ByteKeyRange.of(nextKey, rangeEndKey);
+this.range = ByteKeyRange.of(range.getStartKey(), nextKey);
+return res;
+  }
+
+  /**
+   * Attempts to claim the given key.
+   *
+   * Must be larger than the last successfully claimed key.
+   *
+   * @return {@code true} if the key was successfully claimed, {@code false} 
if it is outside the
+   * current {@link ByteKeyRange} of this tracker (in that case this 
operation is a no-op).
+   */
+  @Override
+  protected synchronized boolean tryClaimImpl(ByteKey key) {
+checkArgument(
+lastAttemptedKey == null || key.compareTo(lastAttemptedKey) > 0,
+"Trying to claim key %s while last attempted was %s",
 
 Review comment:
   We should probably update the error message to clarify why this is a problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93035)
Time Spent: 1h 10m  (was: 1h)

> Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
> -
>
> Key: BEAM-4018
> URL: https://issues.apache.org/jira/browse/BEAM-4018
>

[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=93038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93038
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 20/Apr/18 02:09
Start Date: 20/Apr/18 02:09
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r182929276
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Arrays;
+import java.util.List;
+import java.util.function.Supplier;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientSource;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+
+/**
+ * An {@link EnvironmentFactory} that creates docker containers by shelling 
out to docker. Returned
+ * {@link RemoteEnvironment RemoteEnvironments} own their respective docker 
containers. Not
+ * thread-safe.
+ */
+public class DockerEnvironmentFactory implements EnvironmentFactory {
+
+  public static DockerEnvironmentFactory forServices(
+  DockerWrapper docker,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientSource clientSource,
+  // TODO: Refine this to IdGenerator when we determine where that should 
live.
+  Supplier idGenerator) {
+return new DockerEnvironmentFactory(
+docker,
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+idGenerator,
+clientSource);
+  }
+
+  private final DockerWrapper docker;
+  private final GrpcFnServer 
controlServiceServer;
+  private final GrpcFnServer loggingServiceServer;
+  private final GrpcFnServer retrievalServiceServer;
+  private final GrpcFnServer 
provisioningServiceServer;
+  private final Supplier idGenerator;
+  private final ControlClientSource clientSource;
+
+  private DockerEnvironmentFactory(
+  DockerWrapper docker,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  Supplier idGenerator,
+  ControlClientSource clientSource) {
+this.docker = docker;
+this.controlServiceServer = controlServiceServer;
+this.loggingServiceServer = loggingServiceServer;
+this.retrievalServiceServer = retrievalServiceServer;
+this.provisioningServiceServer = provisioningServiceServer;
+this.idGenerator = idGenerator;
+this.clientSource = clientSource;
+  }
+
+  /** Creates a new, active {@link RemoteEnvironment} backed by a local Docker 
container. */
+  @Override
+  public RemoteEnvironment createEnvironment(Environment environment) throws 
Exception {
+String workerId = idGenerator.get();
+
+// Prepare docker invocation.
+Path workerPersistentDirectory = 
Files.createTempDirectory("worker_persistent_directory");
+Path semiPersistentDirectory = 
Files.createTempDirectory("semi_persistent_dir");
+String containerImage = environment.getUrl();
+// TODO: https://issues.apache.org/jira/browse/BEAM-4148 The default 
service address will not
+// work for Docker for Mac.
+ 

[beam] 01/01: Merge pull request #5190: Add pubsub to Go maven build

2018-04-19 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 0e87ba17c2fa4d2bdf634e5ccb4e7ac7203571a3
Merge: f80a96e 795037e
Author: Kenn Knowles 
AuthorDate: Thu Apr 19 18:49:26 2018 -0700

Merge pull request #5190: Add pubsub to Go maven build

 sdks/go/pkg/beam/util/pubsubx/pubsub.go | 4 ++--
 sdks/go/pom.xml | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[beam] branch master updated (f80a96e -> 0e87ba1)

2018-04-19 Thread kenn
This is an automated email from the ASF dual-hosted git repository.

kenn pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from f80a96e  [BEAM-4093] Support Python ValidatesRunner test in streaming 
(#5147)
 add 795037e  Add pubsub to Go maven build
 new 0e87ba1  Merge pull request #5190: Add pubsub to Go maven build

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/go/pkg/beam/util/pubsubx/pubsub.go | 4 ++--
 sdks/go/pom.xml | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
k...@apache.org.


[jira] [Work logged] (BEAM-3906) Get Python Wheel Validation Automated

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3906?focusedWorklogId=93007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93007
 ]

ASF GitHub Bot logged work on BEAM-3906:


Author: ASF GitHub Bot
Created on: 20/Apr/18 01:23
Start Date: 20/Apr/18 01:23
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4943: [BEAM-3906] Automate 
Validation Aganist Python Wheel
URL: https://github.com/apache/beam/pull/4943#issuecomment-382933717
 
 
   Run Python ReleaseCandidate


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93007)
Time Spent: 16h 10m  (was: 16h)

> Get Python Wheel Validation Automated
> -
>
> Key: BEAM-3906
> URL: https://issues.apache.org/jira/browse/BEAM-3906
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python, testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated: [BEAM-4093] Support Python ValidatesRunner test in streaming (#5147)

2018-04-19 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new f80a96e  [BEAM-4093] Support Python ValidatesRunner test in streaming 
(#5147)
f80a96e is described below

commit f80a96e5f94c6227226305e28d56912d2c92289d
Author: Mark Liu 
AuthorDate: Thu Apr 19 18:10:36 2018 -0700

[BEAM-4093] Support Python ValidatesRunner test in streaming (#5147)

* [BEAM-4093] Support Python ValidatesRunner test in streaming

* fixit! Remove unnecessary option reset
---
 .../apache_beam/examples/streaming_wordcount_it_test.py |  2 ++
 sdks/python/apache_beam/options/pipeline_options.py |  7 +++
 .../apache_beam/runners/dataflow/test_dataflow_runner.py| 13 -
 sdks/python/apache_beam/testing/test_pipeline.py|  3 ++-
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py 
b/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py
index d0b53f5..5db1878 100644
--- a/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py
+++ b/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py
@@ -37,6 +37,7 @@ INPUT_SUB = 'wc_subscription_input'
 OUTPUT_SUB = 'wc_subscription_output'
 
 DEFAULT_INPUT_NUMBERS = 500
+WAIT_UNTIL_FINISH_DURATION = 3 * 60 * 1000   # in milliseconds
 
 
 class StreamingWordCountIT(unittest.TestCase):
@@ -87,6 +88,7 @@ class StreamingWordCountIT(unittest.TestCase):
timeout=400)
 extra_opts = {'input_subscription': self.input_sub.full_name,
   'output_topic': self.output_topic.full_name,
+  'wait_until_finish_duration': WAIT_UNTIL_FINISH_DURATION,
   'on_success_matcher': all_of(state_verifier,
pubsub_msg_verifier)}
 
diff --git a/sdks/python/apache_beam/options/pipeline_options.py 
b/sdks/python/apache_beam/options/pipeline_options.py
index 7a2cd4b..b5f9d77 100644
--- a/sdks/python/apache_beam/options/pipeline_options.py
+++ b/sdks/python/apache_beam/options/pipeline_options.py
@@ -649,6 +649,13 @@ class TestOptions(PipelineOptions):
 default=False,
 help=('Used in unit testing runners without submitting the '
   'actual job.'))
+parser.add_argument(
+'--wait_until_finish_duration',
+default=None,
+type=int,
+help='The time to wait (in milliseconds) for test pipeline to finish. '
+ 'If it is set to None, it will wait indefinitely until the job '
+ 'is finished.')
 
   def validate(self, validator):
 errors = []
diff --git a/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py 
b/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py
index 765ed24..eedfa60 100644
--- a/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py
@@ -18,6 +18,7 @@
 """Wrapper of Beam runners that's built for running and verifying e2e tests."""
 from __future__ import print_function
 
+import logging
 import time
 
 from apache_beam.internal import pickler
@@ -37,6 +38,8 @@ class TestDataflowRunner(DataflowRunner):
 """Execute test pipeline and verify test matcher"""
 options = pipeline._options.view_as(TestOptions)
 on_success_matcher = options.on_success_matcher
+wait_duration = options.wait_until_finish_duration
+is_streaming = options.view_as(StandardOptions).streaming
 
 # [BEAM-1889] Do not send this to remote workers also, there is no need to
 # send this option to remote executors.
@@ -49,10 +52,11 @@ class TestDataflowRunner(DataflowRunner):
   print('Found: %s.' % self.build_console_url(pipeline.options))
 
 try:
-  if not options.view_as(StandardOptions).streaming:
-self.result.wait_until_finish()
-  else:
-self.wait_until_in_state(PipelineState.RUNNING)
+  self.wait_until_in_state(PipelineState.RUNNING)
+
+  if is_streaming and not wait_duration:
+logging.warning('Waiting indefinitely for streaming job.')
+  self.result.wait_until_finish(duration=wait_duration)
 
   if on_success_matcher:
 from hamcrest import assert_that as hc_assert_that
@@ -60,7 +64,6 @@ class TestDataflowRunner(DataflowRunner):
 finally:
   if not self.result.is_in_terminal_state():
 self.result.cancel()
-  if options.view_as(StandardOptions).streaming:
 self.wait_until_in_state(PipelineState.CANCELLED, timeout=300)
 
 return self.result
diff --git a/sdks/python/apache_beam/testing/test_pipeline.py 
b/sdks/python/apache_beam/testing/test_pipeline.py
index 155190c..0525945 100644
--- 

[jira] [Work logged] (BEAM-4093) Support Python ValidatesRunner test against TestDataflowRunner in streaming

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4093?focusedWorklogId=93005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93005
 ]

ASF GitHub Bot logged work on BEAM-4093:


Author: ASF GitHub Bot
Created on: 20/Apr/18 01:10
Start Date: 20/Apr/18 01:10
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #5147: [BEAM-4093] Support 
Python ValidatesRunner test in streaming
URL: https://github.com/apache/beam/pull/5147
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py 
b/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py
index d0b53f50d79..5db1878f34f 100644
--- a/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py
+++ b/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py
@@ -37,6 +37,7 @@
 OUTPUT_SUB = 'wc_subscription_output'
 
 DEFAULT_INPUT_NUMBERS = 500
+WAIT_UNTIL_FINISH_DURATION = 3 * 60 * 1000   # in milliseconds
 
 
 class StreamingWordCountIT(unittest.TestCase):
@@ -87,6 +88,7 @@ def test_streaming_wordcount_it(self):
timeout=400)
 extra_opts = {'input_subscription': self.input_sub.full_name,
   'output_topic': self.output_topic.full_name,
+  'wait_until_finish_duration': WAIT_UNTIL_FINISH_DURATION,
   'on_success_matcher': all_of(state_verifier,
pubsub_msg_verifier)}
 
diff --git a/sdks/python/apache_beam/options/pipeline_options.py 
b/sdks/python/apache_beam/options/pipeline_options.py
index 7a2cd4bf1e4..b5f9d77617d 100644
--- a/sdks/python/apache_beam/options/pipeline_options.py
+++ b/sdks/python/apache_beam/options/pipeline_options.py
@@ -649,6 +649,13 @@ def _add_argparse_args(cls, parser):
 default=False,
 help=('Used in unit testing runners without submitting the '
   'actual job.'))
+parser.add_argument(
+'--wait_until_finish_duration',
+default=None,
+type=int,
+help='The time to wait (in milliseconds) for test pipeline to finish. '
+ 'If it is set to None, it will wait indefinitely until the job '
+ 'is finished.')
 
   def validate(self, validator):
 errors = []
diff --git a/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py 
b/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py
index 765ed245785..eedfa60f9fd 100644
--- a/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py
@@ -18,6 +18,7 @@
 """Wrapper of Beam runners that's built for running and verifying e2e tests."""
 from __future__ import print_function
 
+import logging
 import time
 
 from apache_beam.internal import pickler
@@ -37,6 +38,8 @@ def run_pipeline(self, pipeline):
 """Execute test pipeline and verify test matcher"""
 options = pipeline._options.view_as(TestOptions)
 on_success_matcher = options.on_success_matcher
+wait_duration = options.wait_until_finish_duration
+is_streaming = options.view_as(StandardOptions).streaming
 
 # [BEAM-1889] Do not send this to remote workers also, there is no need to
 # send this option to remote executors.
@@ -49,10 +52,11 @@ def run_pipeline(self, pipeline):
   print('Found: %s.' % self.build_console_url(pipeline.options))
 
 try:
-  if not options.view_as(StandardOptions).streaming:
-self.result.wait_until_finish()
-  else:
-self.wait_until_in_state(PipelineState.RUNNING)
+  self.wait_until_in_state(PipelineState.RUNNING)
+
+  if is_streaming and not wait_duration:
+logging.warning('Waiting indefinitely for streaming job.')
+  self.result.wait_until_finish(duration=wait_duration)
 
   if on_success_matcher:
 from hamcrest import assert_that as hc_assert_that
@@ -60,7 +64,6 @@ def run_pipeline(self, pipeline):
 finally:
   if not self.result.is_in_terminal_state():
 self.result.cancel()
-  if options.view_as(StandardOptions).streaming:
 self.wait_until_in_state(PipelineState.CANCELLED, timeout=300)
 
 return self.result
diff --git a/sdks/python/apache_beam/testing/test_pipeline.py 
b/sdks/python/apache_beam/testing/test_pipeline.py
index 155190c09a7..0525945f15f 100644
--- a/sdks/python/apache_beam/testing/test_pipeline.py
+++ b/sdks/python/apache_beam/testing/test_pipeline.py
@@ -102,7 +102,8 @@ def run(self, test_runner_api=True):
 result = super(TestPipeline, self).run(test_runner_api)
 if self.blocking:
   state = 

[jira] [Work logged] (BEAM-3249) Use Gradle to build/release project

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3249?focusedWorklogId=93004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93004
 ]

ASF GitHub Bot logged work on BEAM-3249:


Author: ASF GitHub Bot
Created on: 20/Apr/18 01:10
Start Date: 20/Apr/18 01:10
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #5117: [BEAM-3249] Clean-up 
and use shaded test jars, removing evaluationDependsOn
URL: https://github.com/apache/beam/pull/5117
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/examples/java/build.gradle b/examples/java/build.gradle
index 76889cafa59..4abe28b5209 100644
--- a/examples/java/build.gradle
+++ b/examples/java/build.gradle
@@ -26,15 +26,6 @@ ext.summary = """Apache Beam SDK provides a simple, 
Java-based
 interface for processing virtually any size data. This
 artifact includes all Apache Beam Java SDK examples."""
 
-/*
- * We need to rely on manually specifying these evaluationDependsOn to ensure 
that
- * the following projects are evaluated before we evaluate this project. This 
is because
- * we are attempting to reference the "sourceSets.test.output" directly.
- * TODO: Swap to generating test artifacts which we can then rely on instead of
- * the test outputs directly.
- */
-evaluationDependsOn(":beam-sdks-java-io-google-cloud-platform")
-
 /** Define the list of runners which execute a precommit test. */
 // https://issues.apache.org/jira/browse/BEAM-3583
 def preCommitRunners = ["dataflowRunner", "dataflowStreamingRunner", 
"directRunner", "flinkRunner", "sparkRunner"]
@@ -68,7 +59,7 @@ dependencies {
   shadow library.java.slf4j_api
   shadow project(path: ":beam-runners-direct-java", configuration: "shadow")
   shadow library.java.slf4j_jdk14
-  shadowTest 
project(":beam-sdks-java-io-google-cloud-platform").sourceSets.test.output
+  shadowTest project(path: ":beam-sdks-java-io-google-cloud-platform", 
configuration: "shadowTest")
   shadowTest library.java.hamcrest_core
   shadowTest library.java.junit
   shadowTest library.java.mockito_core
diff --git 
a/examples/java/src/main/java/org/apache/beam/examples/complete/game/GameStats.java
 
b/examples/java/src/main/java/org/apache/beam/examples/complete/game/GameStats.java
index d81aa30892a..f93a24266a1 100644
--- 
a/examples/java/src/main/java/org/apache/beam/examples/complete/game/GameStats.java
+++ 
b/examples/java/src/main/java/org/apache/beam/examples/complete/game/GameStats.java
@@ -153,7 +153,7 @@ public void processElement(ProcessContext c, BoundedWindow 
window) {
   /**
* Options supported by {@link GameStats}.
*/
-  interface Options extends LeaderBoard.Options {
+  public interface Options extends LeaderBoard.Options {
 @Description("Numeric value of fixed window duration for user analysis, in 
minutes")
 @Default.Integer(60)
 Integer getFixedWindowDuration();
diff --git 
a/examples/java/src/main/java/org/apache/beam/examples/complete/game/HourlyTeamScore.java
 
b/examples/java/src/main/java/org/apache/beam/examples/complete/game/HourlyTeamScore.java
index 86c5196c6cb..0f2912894d3 100644
--- 
a/examples/java/src/main/java/org/apache/beam/examples/complete/game/HourlyTeamScore.java
+++ 
b/examples/java/src/main/java/org/apache/beam/examples/complete/game/HourlyTeamScore.java
@@ -82,7 +82,7 @@
   /**
* Options supported by {@link HourlyTeamScore}.
*/
-  interface Options extends UserScore.Options {
+  public interface Options extends UserScore.Options {
 
 @Description("Numeric value of fixed window duration, in minutes")
 @Default.Integer(60)
diff --git 
a/examples/java/src/main/java/org/apache/beam/examples/complete/game/LeaderBoard.java
 
b/examples/java/src/main/java/org/apache/beam/examples/complete/game/LeaderBoard.java
index 732840b1a1e..31dc48944e4 100644
--- 
a/examples/java/src/main/java/org/apache/beam/examples/complete/game/LeaderBoard.java
+++ 
b/examples/java/src/main/java/org/apache/beam/examples/complete/game/LeaderBoard.java
@@ -96,7 +96,7 @@
   /**
* Options supported by {@link LeaderBoard}.
*/
-  interface Options extends HourlyTeamScore.Options, ExampleOptions, 
StreamingOptions {
+  public interface Options extends HourlyTeamScore.Options, ExampleOptions, 
StreamingOptions {
 
 @Description("BigQuery Dataset to write tables to. Must already exist.")
 @Validation.Required
diff --git a/runners/apex/build.gradle b/runners/apex/build.gradle
index 713abb56cee..d7a57bad30b 100644
--- a/runners/apex/build.gradle
+++ b/runners/apex/build.gradle
@@ -27,11 +27,7 @@ description = "Apache Beam :: Runners :: Apex"
  * We need to rely on manually specifying these evaluationDependsOn to ensure 

[beam] 01/01: Merge pull request #5117 from swegner/luke5107

2018-04-19 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit a3ba6a0e8de3ae72b8fc6fc6038eb9dc725f092e
Merge: e30e0c8 3fe7f43
Author: Ahmet Altay 
AuthorDate: Thu Apr 19 18:10:08 2018 -0700

Merge pull request #5117 from swegner/luke5107

[BEAM-3249] Clean-up and use shaded test jars, removing evaluationDependsOn

 examples/java/build.gradle | 11 +--
 .../org/apache/beam/examples/complete/game/GameStats.java  |  2 +-
 .../beam/examples/complete/game/HourlyTeamScore.java   |  2 +-
 .../apache/beam/examples/complete/game/LeaderBoard.java|  2 +-
 runners/apex/build.gradle  |  6 +-
 runners/direct-java/build.gradle   |  1 -
 runners/flink/build.gradle |  6 +-
 runners/gearpump/build.gradle  |  5 +
 runners/google-cloud-dataflow-java/build.gradle|  8 ++--
 runners/java-fn-execution/build.gradle |  9 -
 runners/spark/build.gradle |  6 +-
 sdks/java/core/build.gradle|  9 -
 sdks/java/harness/build.gradle |  9 -
 sdks/java/io/cassandra/build.gradle| 11 +--
 .../elasticsearch-tests/elasticsearch-tests-2/build.gradle | 14 ++
 .../elasticsearch-tests/elasticsearch-tests-5/build.gradle | 14 ++
 .../elasticsearch-tests-common/build.gradle| 11 +--
 sdks/java/io/file-based-io-tests/build.gradle  | 11 +--
 sdks/java/io/google-cloud-platform/build.gradle| 11 +--
 sdks/java/io/hadoop-input-format/build.gradle  | 11 +--
 sdks/java/io/jdbc/build.gradle | 11 +--
 sdks/java/io/mongodb/build.gradle  | 11 +--
 sdks/java/maven-archetypes/examples/build.gradle   |  6 +-
 23 files changed, 30 insertions(+), 157 deletions(-)


-- 
To stop receiving notification emails like this one, please contact
al...@apache.org.


[beam] branch master updated (e30e0c8 -> a3ba6a0)

2018-04-19 Thread altay
This is an automated email from the ASF dual-hosted git repository.

altay pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from e30e0c8  Merge pull request #5174 [BEAM-3812] Avoid pickling composite 
transforms.
 add dc9a71d  [BEAM-3249] Clean-up and use shaded test jars, removing 
evaluationDependsOn
 add 3fe7f43  Resolve private PipelineOptions conflicts for example tests.
 new a3ba6a0  Merge pull request #5117 from swegner/luke5107

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 examples/java/build.gradle | 11 +--
 .../org/apache/beam/examples/complete/game/GameStats.java  |  2 +-
 .../beam/examples/complete/game/HourlyTeamScore.java   |  2 +-
 .../apache/beam/examples/complete/game/LeaderBoard.java|  2 +-
 runners/apex/build.gradle  |  6 +-
 runners/direct-java/build.gradle   |  1 -
 runners/flink/build.gradle |  6 +-
 runners/gearpump/build.gradle  |  5 +
 runners/google-cloud-dataflow-java/build.gradle|  8 ++--
 runners/java-fn-execution/build.gradle |  9 -
 runners/spark/build.gradle |  6 +-
 sdks/java/core/build.gradle|  9 -
 sdks/java/harness/build.gradle |  9 -
 sdks/java/io/cassandra/build.gradle| 11 +--
 .../elasticsearch-tests/elasticsearch-tests-2/build.gradle | 14 ++
 .../elasticsearch-tests/elasticsearch-tests-5/build.gradle | 14 ++
 .../elasticsearch-tests-common/build.gradle| 11 +--
 sdks/java/io/file-based-io-tests/build.gradle  | 11 +--
 sdks/java/io/google-cloud-platform/build.gradle| 11 +--
 sdks/java/io/hadoop-input-format/build.gradle  | 11 +--
 sdks/java/io/jdbc/build.gradle | 11 +--
 sdks/java/io/mongodb/build.gradle  | 11 +--
 sdks/java/maven-archetypes/examples/build.gradle   |  6 +-
 23 files changed, 30 insertions(+), 157 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
al...@apache.org.


[jira] [Work logged] (BEAM-4097) Python SDK should set the environment in the job submission protos

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4097?focusedWorklogId=93002=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93002
 ]

ASF GitHub Bot logged work on BEAM-4097:


Author: ASF GitHub Bot
Created on: 20/Apr/18 01:08
Start Date: 20/Apr/18 01:08
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #5191: [BEAM-4097] Set 
environment for Python sdk function specs.
URL: https://github.com/apache/beam/pull/5191#issuecomment-382930963
 
 
   R: @angoenka


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93002)
Time Spent: 20m  (was: 10m)

> Python SDK should set the environment in the job submission protos
> --
>
> Key: BEAM-4097
> URL: https://issues.apache.org/jira/browse/BEAM-4097
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4097) Python SDK should set the environment in the job submission protos

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4097?focusedWorklogId=93000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-93000
 ]

ASF GitHub Bot logged work on BEAM-4097:


Author: ASF GitHub Bot
Created on: 20/Apr/18 01:07
Start Date: 20/Apr/18 01:07
Worklog Time Spent: 10m 
  Work Description: robertwb opened a new pull request #5191: [BEAM-4097] 
Set environment for Python sdk function specs.
URL: https://github.com/apache/beam/pull/5191
 
 
   This creates a default environment, and also allows the user to manually 
specify one of their choice.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand:
  - [ ] What the pull request does
  - [ ] Why it does it
  - [ ] How it does it
  - [ ] Why this approach
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Run `./gradlew build` to make sure basic checks pass. A more thorough 
check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 93000)
Time Spent: 10m
Remaining Estimate: 0h

> Python SDK should set the environment in the job submission protos
> --
>
> Key: BEAM-4097
> URL: https://issues.apache.org/jira/browse/BEAM-4097
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3448) Propagate pipeline protos through Dataflow API from Go

2018-04-19 Thread Henning Rohde (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445070#comment-16445070
 ] 

Henning Rohde commented on BEAM-3448:
-

This change qA made as part of Go windowing: 
https://github.com/apache/beam/pull/5179

> Propagate pipeline protos through Dataflow API from Go
> --
>
> Key: BEAM-3448
> URL: https://issues.apache.org/jira/browse/BEAM-3448
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>
> The Go SDK already stages the model pipeline. It also needs to use reference 
> IDs to it in the dataflow payload.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PostCommit_Python_Verify #4744

2018-04-19 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-3812) Avoid pickling PTransforms in proto representation

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3812?focusedWorklogId=92991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92991
 ]

ASF GitHub Bot logged work on BEAM-3812:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:57
Start Date: 20/Apr/18 00:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #5174: 
[BEAM-3812] Avoid pickling composite transforms.
URL: https://github.com/apache/beam/pull/5174#discussion_r182922981
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -152,6 +152,11 @@ def expand(self, pvalue):
   pcoll.element_type = bytes
 return pcoll
 
+  def to_runner_api_parameter(self, context):
+# Required as this is identified by type in PTransformOverrides.
+# TODO(BEAM-3812): Use an actual URN here.
 
 Review comment:
   This is the right JIRA, avoiding pickling transforms. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92991)
Time Spent: 0.5h  (was: 20m)

> Avoid pickling PTransforms in proto representation
> --
>
> Key: BEAM-3812
> URL: https://issues.apache.org/jira/browse/BEAM-3812
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Ahmet Altay
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Any transform that requires passing information through the runner protos 
> should have an explicit urn and payload. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (297448f -> e30e0c8)

2018-04-19 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 297448f  Merge pull request #5158: [BEAM-4062] Fix performance 
regression in FileBasedSink
 add f7468b0  [BEAM-3812] Avoid pickling composite transforms.
 add d04eca0  Fix PTransformOverloads that were relying on pickling.
 new e30e0c8  Merge pull request #5174 [BEAM-3812] Avoid pickling composite 
transforms.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/io/gcp/pubsub.py   | 10 +++
 sdks/python/apache_beam/pipeline.py| 11 +--
 sdks/python/apache_beam/pipeline_test.py   | 33 ++--
 sdks/python/apache_beam/portability/python_urns.py |  4 ++-
 sdks/python/apache_beam/transforms/core.py |  5 
 sdks/python/apache_beam/transforms/ptransform.py   | 35 ++
 sdks/python/apache_beam/utils/urns.py  |  4 ++-
 7 files changed, 56 insertions(+), 46 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
rober...@apache.org.


[jira] [Work logged] (BEAM-3812) Avoid pickling PTransforms in proto representation

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3812?focusedWorklogId=92995=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92995
 ]

ASF GitHub Bot logged work on BEAM-3812:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:57
Start Date: 20/Apr/18 00:57
Worklog Time Spent: 10m 
  Work Description: robertwb closed pull request #5174: [BEAM-3812] Avoid 
pickling composite transforms.
URL: https://github.com/apache/beam/pull/5174
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/io/gcp/pubsub.py 
b/sdks/python/apache_beam/io/gcp/pubsub.py
index f5ca17e64a1..e45dd23bfef 100644
--- a/sdks/python/apache_beam/io/gcp/pubsub.py
+++ b/sdks/python/apache_beam/io/gcp/pubsub.py
@@ -152,6 +152,11 @@ def expand(self, pvalue):
   pcoll.element_type = bytes
 return pcoll
 
+  def to_runner_api_parameter(self, context):
+# Required as this is identified by type in PTransformOverrides.
+# TODO(BEAM-3812): Use an actual URN here.
+return self.to_runner_api_pickled(context)
+
 
 class ReadStringsFromPubSub(PTransform):
   """A ``PTransform`` for reading utf-8 string payloads from Cloud Pub/Sub.
@@ -193,6 +198,11 @@ def expand(self, pcoll):
 pcoll.element_type = bytes
 return pcoll | Write(self._sink)
 
+  def to_runner_api_parameter(self, context):
+# Required as this is identified by type in PTransformOverrides.
+# TODO(BEAM-3812): Use an actual URN here.
+return self.to_runner_api_pickled(context)
+
 
 PROJECT_ID_REGEXP = '[a-z][-a-z0-9:.]{4,61}[a-z0-9]'
 SUBSCRIPTION_REGEXP = 'projects/([^/]+)/subscriptions/(.+)'
diff --git a/sdks/python/apache_beam/pipeline.py 
b/sdks/python/apache_beam/pipeline.py
index 74bd4cb17d0..31fe5c51952 100644
--- a/sdks/python/apache_beam/pipeline.py
+++ b/sdks/python/apache_beam/pipeline.py
@@ -572,7 +572,7 @@ class Visitor(PipelineVisitor):  # pylint: 
disable=used-before-assignment
   ok = True  # Really a nonlocal.
 
   def enter_composite_transform(self, transform_node):
-self.visit_transform(transform_node)
+pass
 
   def visit_transform(self, transform_node):
 try:
@@ -822,7 +822,7 @@ def transform_to_runner_api(transform, context):
   if transform is None:
 return None
   else:
-return transform.to_runner_api(context)
+return transform.to_runner_api(context, has_parts=bool(self.parts))
 return beam_runner_api_pb2.PTransform(
 unique_name=self.full_label,
 spec=transform_to_runner_api(self.transform, context),
@@ -893,6 +893,13 @@ class PTransformOverride(object):
   def matches(self, applied_ptransform):
 """Determines whether the given AppliedPTransform matches.
 
+Note that the matching will happen *after* Runner API proto translation.
+If matching is done via type checks, to/from_runner_api[_parameter] methods
+must be implemented to preserve the type (and other data) through proto
+serialization.
+
+Consider URN-based translation instead.
+
 Args:
   applied_ptransform: AppliedPTransform to be matched.
 
diff --git a/sdks/python/apache_beam/pipeline_test.py 
b/sdks/python/apache_beam/pipeline_test.py
index c3dd2296f20..ed27aa7658b 100644
--- a/sdks/python/apache_beam/pipeline_test.py
+++ b/sdks/python/apache_beam/pipeline_test.py
@@ -88,6 +88,9 @@ class DoubleParDo(beam.PTransform):
   def expand(self, input):
 return input | 'Inner' >> beam.Map(lambda a: a * 2)
 
+  def to_runner_api_parameter(self, context):
+return self.to_runner_api_pickled(context)
+
 
 class TripleParDo(beam.PTransform):
   def expand(self, input):
@@ -524,36 +527,6 @@ def test_dir(self):
 
 class RunnerApiTest(unittest.TestCase):
 
-  def test_simple(self):
-"""Tests serializing, deserializing, and running a simple pipeline.
-
-More extensive tests are done at pipeline.run for each suitable test.
-"""
-p = beam.Pipeline()
-p | beam.Create([None]) | beam.Map(lambda x: x)  # pylint: 
disable=expression-not-assigned
-proto = p.to_runner_api()
-
-p2 = Pipeline.from_runner_api(proto, p.runner, p._options)
-p2.run()
-
-  def test_pickling(self):
-class MyPTransform(beam.PTransform):
-  pickle_count = [0]
-
-  def expand(self, p):
-self.p = p
-return p | beam.Create([None])
-
-  def __reduce__(self):
-self.pickle_count[0] += 1
-return str, ()
-
-p = beam.Pipeline()
-for k in range(20):
-  p | 'Iter%s' % k >> MyPTransform()  # pylint: 
disable=expression-not-assigned
-p.to_runner_api()
-self.assertEqual(MyPTransform.pickle_count[0], 20)
-
   def test_parent_pointer(self):
 

[beam] 01/01: Merge pull request #5174 [BEAM-3812] Avoid pickling composite transforms.

2018-04-19 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit e30e0c807321934e862358e1e3be32dc74374aeb
Merge: 297448f d04eca0
Author: Robert Bradshaw 
AuthorDate: Thu Apr 19 17:57:43 2018 -0700

Merge pull request #5174 [BEAM-3812] Avoid pickling composite transforms.

 sdks/python/apache_beam/io/gcp/pubsub.py   | 10 +++
 sdks/python/apache_beam/pipeline.py| 11 +--
 sdks/python/apache_beam/pipeline_test.py   | 33 ++--
 sdks/python/apache_beam/portability/python_urns.py |  4 ++-
 sdks/python/apache_beam/transforms/core.py |  5 
 sdks/python/apache_beam/transforms/ptransform.py   | 35 ++
 sdks/python/apache_beam/utils/urns.py  |  4 ++-
 7 files changed, 56 insertions(+), 46 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
rober...@apache.org.


Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Spark_Gradle #151

2018-04-19 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Flink_Gradle #169

2018-04-19 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-4062) Performance regression in FileBasedSink

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4062?focusedWorklogId=92980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92980
 ]

ASF GitHub Bot logged work on BEAM-4062:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:39
Start Date: 20/Apr/18 00:39
Worklog Time Spent: 10m 
  Work Description: chamikaramj closed pull request #5158: [BEAM-4062] Fix 
performance regression in FileBasedSink.
URL: https://github.com/apache/beam/pull/5158
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/io/filebasedsink.py 
b/sdks/python/apache_beam/io/filebasedsink.py
index ab3ab5fd5b3..126eb868abf 100644
--- a/sdks/python/apache_beam/io/filebasedsink.py
+++ b/sdks/python/apache_beam/io/filebasedsink.py
@@ -98,6 +98,8 @@ def __init__(self,
 self.num_shards = num_shards
 self.coder = coder
 self.shard_name_format = self._template_to_format(shard_name_template)
+self.shard_name_glob_format = self._template_to_glob_format(
+shard_name_template)
 self.compression_type = compression_type
 self.mime_type = mime_type
 
@@ -188,36 +190,58 @@ def _get_final_name(self, shard_num, num_shards):
 self.file_name_suffix.get()
 ])
 
-  def pre_finalize(self, init_result, writer_results):
-writer_results = sorted(writer_results)
-num_shards = len(writer_results)
-existing_files = []
-for shard_num in range(len(writer_results)):
-  final_name = self._get_final_name(shard_num, num_shards)
-  if FileSystems.exists(final_name):
-existing_files.append(final_name)
-if existing_files:
-  logging.info('Deleting existing files in target path: %d',
-   len(existing_files))
-  FileSystems.delete(existing_files)
+  @check_accessible(['file_path_prefix', 'file_name_suffix'])
+  def _get_final_name_glob(self, num_shards):
+return ''.join([
+self.file_path_prefix.get(),
+self.shard_name_glob_format % dict(num_shards=num_shards),
+self.file_name_suffix.get()
+])
 
-  @check_accessible(['file_path_prefix'])
-  def finalize_write(self, init_result, writer_results,
- unused_pre_finalize_results):
-writer_results = sorted(writer_results)
-num_shards = len(writer_results)
+  def pre_finalize(self, init_result, writer_results):
+num_shards = len(list(writer_results))
+dst_glob = self._get_final_name_glob(num_shards)
+dst_glob_files = [file_metadata.path
+  for mr in FileSystems.match([dst_glob])
+  for file_metadata in mr.metadata_list]
+
+if dst_glob_files:
+  logging.warn('Deleting %d existing files in target path matching: %s',
+   len(dst_glob_files), self.shard_name_glob_format)
+  FileSystems.delete(dst_glob_files)
+
+  def _check_state_for_finalize_write(self, writer_results, num_shards):
+"""Checks writer output files' states.
+
+Returns:
+  src_files, dst_files: Lists of files to rename. For each i, 
finalize_write
+should rename(src_files[i], dst_files[i]).
+  delete_files: Src files to delete. These could be leftovers from an
+incomplete (non-atomic) rename operation.
+  num_skipped: Tally of writer results files already renamed, such as from
+a previous run of finalize_write().
+"""
+if not writer_results:
+  return [], [], [], 0
+
+src_glob = FileSystems.join(FileSystems.split(writer_results[0])[0], '*')
+dst_glob = self._get_final_name_glob(num_shards)
+src_glob_files = set(file_metadata.path
+ for mr in FileSystems.match([src_glob])
+ for file_metadata in mr.metadata_list)
+dst_glob_files = set(file_metadata.path
+ for mr in FileSystems.match([dst_glob])
+ for file_metadata in mr.metadata_list)
 
 src_files = []
 dst_files = []
 delete_files = []
-chunk_size = FileSystems.get_chunk_size(self.file_path_prefix.get())
 num_skipped = 0
-for shard_num, shard in enumerate(writer_results):
+for shard_num, src in enumerate(writer_results):
   final_name = self._get_final_name(shard_num, num_shards)
-  src = shard
   dst = final_name
-  src_exists = FileSystems.exists(src)
-  dst_exists = FileSystems.exists(dst)
+  src_exists = src in src_glob_files
+  dst_exists = dst in dst_glob_files
   if not src_exists and not dst_exists:
 raise BeamIOError('src and dst files do not exist. src: %s, dst: %s' % 
(
 src, dst))
@@ -233,13 +257,23 @@ def finalize_write(self, init_result, writer_results,
 
   

[beam] branch master updated (dffe509 -> 297448f)

2018-04-19 Thread chamikara
This is an automated email from the ASF dual-hosted git repository.

chamikara pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from dffe509  Merge pull request #5122 Suppress some errorprone warnings in 
schemas.
 add 907792f  Fix performance regression in FileBasedSink.
 add 7f36efa  Address review comments and lint warnings.
 new 297448f  Merge pull request #5158: [BEAM-4062] Fix performance 
regression in FileBasedSink

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 sdks/python/apache_beam/io/filebasedsink.py | 117 
 1 file changed, 84 insertions(+), 33 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
chamik...@apache.org.


[jira] [Work logged] (BEAM-4062) Performance regression in FileBasedSink

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4062?focusedWorklogId=92979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92979
 ]

ASF GitHub Bot logged work on BEAM-4062:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:38
Start Date: 20/Apr/18 00:38
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #5158: [BEAM-4062] Fix 
performance regression in FileBasedSink.
URL: https://github.com/apache/beam/pull/5158#issuecomment-382925438
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92979)
Time Spent: 2h 20m  (was: 2h 10m)

> Performance regression in FileBasedSink
> ---
>
> Key: BEAM-4062
> URL: https://issues.apache.org/jira/browse/BEAM-4062
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.5.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/beam/pull/4648] has added:
>  * 3 or more stat() calls per output file (in pre_finalize and 
> finalize_writes)
>  * serial unbatched delete()s (in pre_finalize)
> Solution will be to list files in a batch operation (match()), and to 
> delete() in batch mode, or use multiple threads if that's not possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #5158: [BEAM-4062] Fix performance regression in FileBasedSink

2018-04-19 Thread chamikara
This is an automated email from the ASF dual-hosted git repository.

chamikara pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 297448f36a289270a4fc7d20e162ec75340cf286
Merge: dffe509 7f36efa
Author: Chamikara Jayalath 
AuthorDate: Thu Apr 19 17:39:06 2018 -0700

Merge pull request #5158: [BEAM-4062] Fix performance regression in 
FileBasedSink

 sdks/python/apache_beam/io/filebasedsink.py | 117 
 1 file changed, 84 insertions(+), 33 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
chamik...@apache.org.


[jira] [Commented] (BEAM-3886) Python SDK harness does not contact State API if not needed

2018-04-19 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445049#comment-16445049
 ] 

Thomas Weise commented on BEAM-3886:


[~lcwik] had started work in this:  
[https://github.com/lukecwik/incubator-beam/commit/fb2b18617eae8b4e1c67af5f233f017ea1b4fb41]

 

> Python SDK harness does not contact State API if not needed
> ---
>
> Key: BEAM-3886
> URL: https://issues.apache.org/jira/browse/BEAM-3886
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ben Sidhom
>Assignee: Ahmet Altay
>Priority: Minor
>
> The Python harness always talks to the State API, even if it is never used by 
> the current process bundle. As a minor optimization and to make implementing 
> new runners easier, the harness should not talk to the State server unless 
> it's actually needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=92978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92978
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:36
Start Date: 20/Apr/18 00:36
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r182920837
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ControlClientPool.java
 ##
 @@ -17,16 +17,12 @@
  */
 package org.apache.beam.runners.fnexecution.control;
 
-import org.apache.beam.sdk.fn.function.ThrowingConsumer;
-import org.apache.beam.sdk.util.ThrowingSupplier;
-
 /** Control client pool that exposes a source and sink of control clients. */
-public interface ControlClientPool {
+public interface ControlClientPool {
 
   /** Source of control clients. */
 
 Review comment:
   
   
   > **jkff** wrote:
   > Generally, why the higher-order functions here: why not have 
ControlClientPool have a function for getting a client, and a function for 
adding a client to the pool? That would match how other pool-like abstractions 
typically work.
   
   
   I want to minimize the surface area that consumers of these endpoints can 
interact with. We previously bundled everything into a single pool that could 
be read from or written to, but it made separation of responsibilities very 
unclear and the APIs hard to work with.
   
   In general, the owner of the pool will be separate from the client producers 
and consumers. With this interface, the owner is the _only_ one with access to 
the client pool itself. The producer of incoming control connections (in 
practice, the gRPC service) should only have access to an object sink into 
which it feeds clients. The consumer (for example, the docker factory) only has 
access to the source of incoming clients.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92978)
Time Spent: 11h 50m  (was: 11h 40m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=92977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92977
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:36
Start Date: 20/Apr/18 00:36
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r182920831
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientPoolService.java
 ##
 @@ -34,14 +33,13 @@
 implements FnService {
   private static final Logger LOGGER = 
LoggerFactory.getLogger(FnApiControlClientPoolService.class);
 
-  private final ThrowingConsumer clientPool;
+  private final ControlClientSink clientPool;
 
 Review comment:
   
   
   > **jkff** wrote:
   > Should it be called clientSink, or should the type be ControlClientPool?
   
   
   It should be called `clientSink`. This is residual from a previous refactor.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92977)
Time Spent: 11h 40m  (was: 11.5h)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=92973=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92973
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:36
Start Date: 20/Apr/18 00:36
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r182920826
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerWrapper.java
 ##
 @@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import com.google.common.collect.ImmutableList;
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import java.time.Duration;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.regex.Pattern;
+import java.util.stream.Collectors;
+
+/** A docker command wrapper. Simplifies communications with the Docker 
daemon. */
+class DockerWrapper {
+  // TODO: Should we require 64-character container ids? Docker technically 
allows abbreviated ids,
+  // but we _should_ always capture full ids.
+  private static final Pattern CONTAINER_ID_PATTERN = 
Pattern.compile("\\p{XDigit}{64}");
+
+  static DockerWrapper forCommand(String dockerExecutable, Duration 
commandTimeout) {
+return new DockerWrapper(dockerExecutable, commandTimeout);
+  }
+
+  private final String dockerExecutable;
+  private final Duration commandTimeout;
+
+  private DockerWrapper(String dockerExecutable, Duration commandTimeout) {
+this.dockerExecutable = dockerExecutable;
+this.commandTimeout = commandTimeout;
+  }
+
+  /**
+   * Runs the given container image with the given command line arguments. 
Returns the running
+   * container id.
+   */
+  public String runImage(String imageTag, List args)
+  throws IOException, TimeoutException, InterruptedException {
+checkArgument(!imageTag.isEmpty(), "Docker image tag required");
+// TODO: Validate args?
+return runShortCommand(
+ImmutableList.builder()
+.add(dockerExecutable)
+.add("run")
+.add("-d")
+.add(imageTag)
+.addAll(args)
+.build());
+  }
+
+  /**
+   * Kills a docker container by container id.
+   *
+   * @throws IOException if an IOException occurs or if the given container id 
does not exist
+   */
+  public void killContainer(String containerId)
+  throws IOException, TimeoutException, InterruptedException {
+checkArgument(containerId != null);
+checkArgument(
+CONTAINER_ID_PATTERN.matcher(containerId).matches(),
+"Container ID must be a 64-character hexadecimal string");
+runShortCommand(Arrays.asList(dockerExecutable, "kill", containerId));
+  }
+
+  /** Run the given command invocation and return stdout as a String. */
+  private String runShortCommand(List invocation)
+  throws IOException, TimeoutException, InterruptedException {
+ProcessBuilder pb = new ProcessBuilder(invocation);
+Process process = pb.start();
+// TODO: Consider supplying executor service here.
+CompletableFuture resultString =
+CompletableFuture.supplyAsync(
+() -> {
+  // NOTE: We do not own the underlying stream and do not close it.
+  BufferedReader reader =
+  new BufferedReader(
+  new InputStreamReader(process.getInputStream(), 
StandardCharsets.UTF_8));
 
 Review comment:
   
   
   > **jkff** wrote:
   > Should we capture the error stream also?
   
   
   We could. I went through several iterations of this 

[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=92974=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92974
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:36
Start Date: 20/Apr/18 00:36
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r182920828
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Arrays;
+import java.util.List;
+import java.util.function.Supplier;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientSource;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+
+/**
+ * An {@link EnvironmentFactory} that creates docker containers by shelling 
out to docker. Not
+ * thread-safe.
+ */
+public class DockerEnvironmentFactory implements EnvironmentFactory {
+
+  public static DockerEnvironmentFactory forServices(
+  DockerWrapper docker,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientSource clientSource,
+  // TODO: Refine this to IdGenerator when we determine where that should 
live.
+  Supplier idGenerator) {
+return new DockerEnvironmentFactory(
+docker,
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+idGenerator,
+clientSource);
+  }
+
+  private final DockerWrapper docker;
+  private final GrpcFnServer 
controlServiceServer;
+  private final GrpcFnServer loggingServiceServer;
+  private final GrpcFnServer retrievalServiceServer;
+  private final GrpcFnServer 
provisioningServiceServer;
+  private final Supplier idGenerator;
+  private final ControlClientSource clientSource;
+
+  private RemoteEnvironment dockerEnvironment = null;
+
+  private DockerEnvironmentFactory(
+  DockerWrapper docker,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  Supplier idGenerator,
+  ControlClientSource clientSource) {
+this.docker = docker;
+this.controlServiceServer = controlServiceServer;
+this.loggingServiceServer = loggingServiceServer;
+this.retrievalServiceServer = retrievalServiceServer;
+this.provisioningServiceServer = provisioningServiceServer;
+this.idGenerator = idGenerator;
+this.clientSource = clientSource;
+  }
+
+  /** Creates an active {@link RemoteEnvironment} backed by a local Docker 
container. */
+  @Override
+  public RemoteEnvironment createEnvironment(Environment environment) throws 
Exception {
+if (dockerEnvironment == null) {
+  dockerEnvironment = createDockerEnv(environment);
+} else {
+  checkArgument(
+  
environment.getUrl().equals(dockerEnvironment.getEnvironment().getUrl()),
+  "A %s must only be queried for a single %s. Existing %s, Argument 
%s",
+  DockerEnvironmentFactory.class.getSimpleName(),
+  Environment.class.getSimpleName(),
+  

[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=92975=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92975
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:36
Start Date: 20/Apr/18 00:36
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r182920827
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.Arrays;
+import java.util.List;
+import java.util.function.Supplier;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientSource;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+
+/**
+ * An {@link EnvironmentFactory} that creates docker containers by shelling 
out to docker. Not
 
 Review comment:
   
   
   > **jkff** wrote:
   > Mention the requirement from line 94? Does that mean you need one 
DockerEnvironmentFactory for each type of container (e.g. Java, Python, Go) or 
is it more / less strict?
   
   
   That was an outdated requirement and was meant to be removed. I'll add a 
test to make sure it works multiple times. ;)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92975)
Time Spent: 11.5h  (was: 11h 20m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3327) Add abstractions to manage Environment Instance lifecycles.

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3327?focusedWorklogId=92972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92972
 ]

ASF GitHub Bot logged work on BEAM-3327:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:36
Start Date: 20/Apr/18 00:36
Worklog Time Spent: 10m 
  Work Description: bsidhom commented on a change in pull request #5189: 
[BEAM-3327] Basic Docker environment factory
URL: https://github.com/apache/beam/pull/5189#discussion_r182920825
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerContainerEnvironment.java
 ##
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.control.SdkHarnessClient;
+
+/**
+ * A {@link RemoteEnvironment} that wraps a running Docker container.
+ *
+ * Accessors are thread-compatible.
+ */
+class DockerContainerEnvironment implements RemoteEnvironment {
+
+  static DockerContainerEnvironment create(
 
 Review comment:
   
   
   > **jkff** wrote:
   > Mention in a comment that this takes ownership of instructionHandler and 
the Docker container?
   
   
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92972)
Time Spent: 11h  (was: 10h 50m)

> Add abstractions to manage Environment Instance lifecycles.
> ---
>
> Key: BEAM-3327
> URL: https://issues.apache.org/jira/browse/BEAM-3327
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Thomas Groh
>Assignee: Ben Sidhom
>Priority: Major
>  Labels: portability
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> This permits remote stage execution for arbitrary environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4038) Support Kafka Headers in KafkaIO

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4038?focusedWorklogId=92969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92969
 ]

ASF GitHub Bot logged work on BEAM-4038:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:34
Start Date: 20/Apr/18 00:34
Worklog Time Spent: 10m 
  Work Description: gkumar7 commented on a change in pull request #5111: 
[BEAM-4038] Support Kafka Headers in KafkaIO
URL: https://github.com/apache/beam/pull/5111#discussion_r182916997
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaRecordCoder.java
 ##
 @@ -66,9 +76,34 @@ public void encode(KafkaRecord value, OutputStream 
outStream) throws IOExc
 longCoder.decode(inStream),
 longCoder.decode(inStream),
 KafkaTimestampType.forOrdinal(intCoder.decode(inStream)),
+(Headers) toHeaders(headerCoder.decode(inStream)),
 kvCoder.decode(inStream));
   }
 
+  private Object toHeaders(Iterable> records) {
+if (!records.iterator().hasNext()){
+  return null;
+}
+
+// ConsumerRecord is used to simply create a list of headers
+ConsumerRecord consumerRecord = new ConsumerRecord<>(
 
 Review comment:
   ```Headers headers = ... ```
   raises a ```ClassNotFoundException``` when serializing 
```KafkaRecordCoder```.
   
   Updated to use empty strings instead of "key", "value"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92969)
Time Spent: 7h 40m  (was: 7.5h)

> Support Kafka Headers in KafkaIO
> 
>
> Key: BEAM-4038
> URL: https://issues.apache.org/jira/browse/BEAM-4038
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Geet Kumar
>Assignee: Raghu Angadi
>Priority: Minor
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Headers have been added to Kafka Consumer/Producer records (KAFKA-4208). The 
> purpose of this JIRA is to support this feature in KafkaIO.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4038) Support Kafka Headers in KafkaIO

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4038?focusedWorklogId=92971=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92971
 ]

ASF GitHub Bot logged work on BEAM-4038:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:34
Start Date: 20/Apr/18 00:34
Worklog Time Spent: 10m 
  Work Description: gkumar7 commented on a change in pull request #5111: 
[BEAM-4038] Support Kafka Headers in KafkaIO
URL: https://github.com/apache/beam/pull/5111#discussion_r182917521
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaRecordCoder.java
 ##
 @@ -66,9 +76,34 @@ public void encode(KafkaRecord value, OutputStream 
outStream) throws IOExc
 longCoder.decode(inStream),
 longCoder.decode(inStream),
 KafkaTimestampType.forOrdinal(intCoder.decode(inStream)),
+(Headers) toHeaders(headerCoder.decode(inStream)),
 kvCoder.decode(inStream));
   }
 
+  private Object toHeaders(Iterable> records) {
+if (!records.iterator().hasNext()){
+  return null;
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92971)
Time Spent: 7h 50m  (was: 7h 40m)

> Support Kafka Headers in KafkaIO
> 
>
> Key: BEAM-4038
> URL: https://issues.apache.org/jira/browse/BEAM-4038
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Geet Kumar
>Assignee: Raghu Angadi
>Priority: Minor
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Headers have been added to Kafka Consumer/Producer records (KAFKA-4208). The 
> purpose of this JIRA is to support this feature in KafkaIO.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4038) Support Kafka Headers in KafkaIO

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4038?focusedWorklogId=92970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92970
 ]

ASF GitHub Bot logged work on BEAM-4038:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:34
Start Date: 20/Apr/18 00:34
Worklog Time Spent: 10m 
  Work Description: gkumar7 commented on a change in pull request #5111: 
[BEAM-4038] Support Kafka Headers in KafkaIO
URL: https://github.com/apache/beam/pull/5111#discussion_r182920256
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java
 ##
 @@ -139,4 +156,11 @@ public long offsetForTime(Consumer consumer, 
TopicPartition topicPartition
   return offsetAndTimestamp.offset();
 }
   }
+
+  public Headers getHeaders(ConsumerRecord rawRecord) {
 
 Review comment:
   This is used by ```KafkaUnboundedReader```, would you like to move this 
logic to that class?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92970)

> Support Kafka Headers in KafkaIO
> 
>
> Key: BEAM-4038
> URL: https://issues.apache.org/jira/browse/BEAM-4038
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Geet Kumar
>Assignee: Raghu Angadi
>Priority: Minor
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Headers have been added to Kafka Consumer/Producer records (KAFKA-4208). The 
> purpose of this JIRA is to support this feature in KafkaIO.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4149) Java SDK Harness should populate worker id in control plane headers

2018-04-19 Thread Ben Sidhom (JIRA)
Ben Sidhom created BEAM-4149:


 Summary: Java SDK Harness should populate worker id in control 
plane headers
 Key: BEAM-4149
 URL: https://issues.apache.org/jira/browse/BEAM-4149
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness
Reporter: Ben Sidhom
Assignee: Luke Cwik


The Java SDK harness currently does nothing to populate control plane headers 
with the harness worker id. This id is necessary in order to identify harnesses 
when multiple are run from the same runner control server.

Note that this affects the _Java_ harness specifically (e.g., when running a 
local process or in-memory harness). When the harness launched within the 
docker container, the go boot code takes care of setting this: 
https://github.com/apache/beam/blob/dffe50924f34d3cc994008703f01e802c99913d2/sdks/java/container/boot.go#L70



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3906) Get Python Wheel Validation Automated

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3906?focusedWorklogId=92968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92968
 ]

ASF GitHub Bot logged work on BEAM-3906:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:27
Start Date: 20/Apr/18 00:27
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #4943: [BEAM-3906] Automate 
Validation Aganist Python Wheel
URL: https://github.com/apache/beam/pull/4943#issuecomment-382923316
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92968)
Time Spent: 16h  (was: 15h 50m)

> Get Python Wheel Validation Automated
> -
>
> Key: BEAM-3906
> URL: https://issues.apache.org/jira/browse/BEAM-3906
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python, testing
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_MongoDBIO_IT #71

2018-04-19 Thread Apache Jenkins Server
See 


Changes:

[robertwb] Suppress some (fatal) warnings in schemas.

[tgroh] Remove SdkHarnessClientControlService

[tgroh] Add a Test for Length Prefix between Coders

[robertwb] Allow manual specification of external address for ULR.

[robertwb] Infer all coders before passing to runner API.

[robertwb] Bugfix changes from apache/beam#4811

[aromanenko.dev] [BEAM-3484] Fix split issue in HadoopInputFormatIOIT

--
[...truncated 90.96 KB...]
[INFO] Excluding io.netty:netty-codec-http:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler-proxy:jar:4.1.8.Final from the shaded 
jar.
[INFO] Excluding io.netty:netty-codec-socks:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-handler:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-buffer:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-common:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-transport:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-resolver:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.netty:netty-codec:jar:4.1.8.Final from the shaded jar.
[INFO] Excluding io.grpc:grpc-stub:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-core:jar:1.0.2 from the shaded 
jar.
[INFO] Excluding org.json:json:jar:20160810 from the shaded jar.
[INFO] Excluding com.google.cloud:google-cloud-spanner:jar:0.20.0b-beta from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-instance-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-cloud-spanner-v1:jar:0.1.11b 
from the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-database-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding 
com.google.api.grpc:grpc-google-cloud-spanner-admin-instance-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:grpc-google-longrunning-v1:jar:0.1.11 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-longrunning-v1:jar:0.1.11 
from the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3 from 
the shaded jar.
[INFO] Excluding com.google.cloud.bigtable:bigtable-client-core:jar:1.0.0 from 
the shaded jar.
[INFO] Excluding commons-logging:commons-logging:jar:1.2 from the shaded jar.
[INFO] Excluding com.google.auth:google-auth-library-appengine:jar:0.7.0 from 
the shaded jar.
[INFO] Excluding io.opencensus:opencensus-contrib-grpc-util:jar:0.7.0 from the 
shaded jar.
[INFO] Excluding io.opencensus:opencensus-api:jar:0.7.0 from the shaded jar.
[INFO] Excluding io.dropwizard.metrics:metrics-core:jar:3.1.2 from the shaded 
jar.
[INFO] Excluding 
com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding com.google.api.grpc:proto-google-common-protos:jar:0.1.9 from 
the shaded jar.
[INFO] Excluding io.grpc:grpc-all:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-okhttp:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.squareup.okhttp:okhttp:jar:2.5.0 from the shaded jar.
[INFO] Excluding com.squareup.okio:okio:jar:1.6.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf-lite:jar:1.2.0 from the shaded jar.
[INFO] Excluding io.grpc:grpc-protobuf-nano:jar:1.2.0 from the shaded jar.
[INFO] Excluding com.google.protobuf.nano:protobuf-javanano:jar:3.0.0-alpha-5 
from the shaded jar.
[INFO] Excluding io.netty:netty-tcnative-boringssl-static:jar:1.1.33.Fork26 
from the shaded jar.
[INFO] Excluding 
org.apache.beam:beam-runners-core-construction-java:jar:2.5.0-SNAPSHOT from the 
shaded jar.
[INFO] Excluding org.apache.beam:beam-model-job-management:jar:2.5.0-SNAPSHOT 
from the shaded jar.
[INFO] Excluding com.google.protobuf:protobuf-java-util:jar:3.2.0 from the 
shaded jar.
[INFO] Excluding com.google.code.gson:gson:jar:2.7 from the shaded jar.
[INFO] Excluding com.google.api-client:google-api-client:jar:1.22.0 from the 
shaded jar.
[INFO] Excluding com.google.oauth-client:google-oauth-client:jar:1.22.0 from 
the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client:jar:1.22.0 from the 
shaded jar.
[INFO] Excluding org.apache.httpcomponents:httpclient:jar:4.0.1 from the shaded 
jar.
[INFO] Excluding org.apache.httpcomponents:httpcore:jar:4.0.1 from the shaded 
jar.
[INFO] Excluding commons-codec:commons-codec:jar:1.3 from the shaded jar.
[INFO] Excluding com.google.http-client:google-http-client-jackson2:jar:1.22.0 
from the shaded jar.
[INFO] Excluding 
com.google.apis:google-api-services-dataflow:jar:v1b3-rev221-1.22.0 from the 
shaded jar.
[INFO] Excluding 
com.google.apis:google-api-services-clouddebugger:jar:v2-rev8-1.22.0 from the 

[jira] [Created] (BEAM-4148) Local server api descriptors contain urls that work on Mac and Linux

2018-04-19 Thread Ben Sidhom (JIRA)
Ben Sidhom created BEAM-4148:


 Summary: Local server api descriptors contain urls that work on 
Mac and Linux
 Key: BEAM-4148
 URL: https://issues.apache.org/jira/browse/BEAM-4148
 Project: Beam
  Issue Type: Improvement
  Components: runner-core
Reporter: Ben Sidhom
Assignee: Kenneth Knowles


Docker for Mac does not allow host networking and thus will not allow SDK 
harnesses to access runner services via `localhost`. Instead, a special DNS 
name is used to refer to the host machine: docker.for.mac.host.internal. (Note 
that this value sometimes changes between Docker releases).

We should attempt to detect the host operating system and return different API 
descriptors based on this.

See 
[https://github.com/bsidhom/beam/commit/3adaeb0d33dc26f0910c1f8af2821cce4ee0b965]
 for how this might be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4044) Take advantage of Calcite DDL

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4044?focusedWorklogId=92961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92961
 ]

ASF GitHub Bot logged work on BEAM-4044:


Author: ASF GitHub Bot
Created on: 20/Apr/18 00:16
Start Date: 20/Apr/18 00:16
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #5154: [BEAM-4044] [SQL] 
Make BeamCalciteTable self planning
URL: https://github.com/apache/beam/pull/5154#issuecomment-382921290
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92961)
Time Spent: 6h  (was: 5h 50m)

> Take advantage of Calcite DDL
> -
>
> Key: BEAM-4044
> URL: https://issues.apache.org/jira/browse/BEAM-4044
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> In Calcite 1.15 support for abstract DDL moved into calcite core. We should 
> take advantage of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PerformanceTests_HadoopInputFormat #162

2018-04-19 Thread Apache Jenkins Server
See 


--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam4 (beam) in workspace 
/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_HadoopInputFormat
FATAL: java.io.IOException: Unexpected termination of the channel
java.io.EOFException
at 
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2671)
at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3146)
at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:858)
at java.io.ObjectInputStream.(ObjectInputStream.java:354)
at 
hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48)
at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to beam4
at 
hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693)
at hudson.remoting.Request.call(Request.java:192)
at hudson.remoting.Channel.call(Channel.java:907)
at hudson.FilePath.act(FilePath.java:986)
at hudson.FilePath.act(FilePath.java:975)
at org.jenkinsci.plugins.gitclient.Git.getClient(Git.java:137)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:795)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:786)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1154)
at hudson.scm.SCM.checkout(SCM.java:495)
at 
hudson.model.AbstractProject.checkout(AbstractProject.java:1202)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at 
jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1724)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at 
hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused: hudson.remoting.RequestAbortedException
at hudson.remoting.Request.abort(Request.java:329)
at hudson.remoting.Channel.terminate(Channel.java:992)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:96)


Build failed in Jenkins: beam_PerformanceTests_JDBC #473

2018-04-19 Thread Apache Jenkins Server
See 


--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam4 (beam) in workspace 
/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_JDBC
FATAL: java.io.IOException: Unexpected termination of the channel
java.io.EOFException
at 
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2671)
at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3146)
at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:858)
at java.io.ObjectInputStream.(ObjectInputStream.java:354)
at 
hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48)
at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to beam4
at 
hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693)
at hudson.remoting.Request.call(Request.java:192)
at hudson.remoting.Channel.call(Channel.java:907)
at hudson.FilePath.act(FilePath.java:986)
at hudson.FilePath.act(FilePath.java:975)
at org.jenkinsci.plugins.gitclient.Git.getClient(Git.java:137)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:795)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:786)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1154)
at hudson.scm.SCM.checkout(SCM.java:495)
at 
hudson.model.AbstractProject.checkout(AbstractProject.java:1202)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at 
jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1724)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at 
hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused: hudson.remoting.RequestAbortedException
at hudson.remoting.Request.abort(Request.java:329)
at hudson.remoting.Channel.terminate(Channel.java:992)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:96)


[jira] [Assigned] (BEAM-4097) Python SDK should set the environment in the job submission protos

2018-04-19 Thread Robert Bradshaw (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw reassigned BEAM-4097:
-

Assignee: Robert Bradshaw  (was: Ahmet Altay)

> Python SDK should set the environment in the job submission protos
> --
>
> Key: BEAM-4097
> URL: https://issues.apache.org/jira/browse/BEAM-4097
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3991) Update dependency 'google-api-services-storage' to latest version

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3991?focusedWorklogId=92939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92939
 ]

ASF GitHub Bot logged work on BEAM-3991:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:47
Start Date: 19/Apr/18 23:47
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #5046: [BEAM-3991] 
Updating Google API client dependencies to 1.23 versions
URL: https://github.com/apache/beam/pull/5046#issuecomment-382915557
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92939)
Time Spent: 1h  (was: 50m)

> Update dependency 'google-api-services-storage' to latest version
> -
>
> Key: BEAM-3991
> URL: https://issues.apache.org/jira/browse/BEAM-3991
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp, sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>Priority: Blocker
> Fix For: 2.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently we use version 'v1-rev71-1.22.0' which is deprecated and about two 
> years old.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4028) Step / Operation naming should rely on a NameContext class

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4028?focusedWorklogId=92937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92937
 ]

ASF GitHub Bot logged work on BEAM-4028:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:45
Start Date: 19/Apr/18 23:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5135: [BEAM-4028] 
Transitioning MapTask objects to NameContext
URL: https://github.com/apache/beam/pull/5135#issuecomment-382915146
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92937)
Time Spent: 3h 50m  (was: 3h 40m)

> Step / Operation naming should rely on a NameContext class
> --
>
> Key: BEAM-4028
> URL: https://issues.apache.org/jira/browse/BEAM-4028
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Steps can have different names depending on the runner (stage, step, user, 
> system name...). 
> Depending on the needs of different components (operations, logging, metrics, 
> statesampling) these step names are passed around without a specific order.
> Instead, SDK should rely on `NameContext` objects that carry all the naming 
> information for a single step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4018) Add a ByteKeyRangeTracker based on RestrictionTracker for SDF

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4018?focusedWorklogId=92938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92938
 ]

ASF GitHub Bot logged work on BEAM-4018:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:45
Start Date: 19/Apr/18 23:45
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #5175: [BEAM-4018] Add a 
ByteKeyRangeTracker based on RestrictionTracker for SDF
URL: https://github.com/apache/beam/pull/5175#issuecomment-382915277
 
 
   Sure, will take a look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92938)
Time Spent: 40m  (was: 0.5h)

> Add a ByteKeyRangeTracker based on RestrictionTracker for SDF
> -
>
> Key: BEAM-4018
> URL: https://issues.apache.org/jira/browse/BEAM-4018
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We can have a RestrictionTracker for ByteKey ranges as part of the core sdk 
> so it can be reused by future SDF based IOs like Bigtable, HBase among others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3812) Avoid pickling PTransforms in proto representation

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3812?focusedWorklogId=92931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92931
 ]

ASF GitHub Bot logged work on BEAM-3812:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:42
Start Date: 19/Apr/18 23:42
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5174: [BEAM-3812] Avoid pickling composite transforms.
URL: https://github.com/apache/beam/pull/5174#discussion_r182846932
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -152,6 +152,11 @@ def expand(self, pvalue):
   pcoll.element_type = bytes
 return pcoll
 
+  def to_runner_api_parameter(self, context):
+# Required as this is identified by type in PTransformOverrides.
+# TODO(BEAM-3812): Use an actual URN here.
 
 Review comment:
   Should this be a different JIRA ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92931)
Time Spent: 20m  (was: 10m)

> Avoid pickling PTransforms in proto representation
> --
>
> Key: BEAM-3812
> URL: https://issues.apache.org/jira/browse/BEAM-3812
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Ahmet Altay
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Any transform that requires passing information through the runner protos 
> should have an explicit urn and payload. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4098) Handle WindowInto in the Java SDK Harness

2018-04-19 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-4098.
--
   Resolution: Duplicate
Fix Version/s: Not applicable

> Handle WindowInto in the Java SDK Harness
> -
>
> Key: BEAM-4098
> URL: https://issues.apache.org/jira/browse/BEAM-4098
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Robert Bradshaw
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3255) Update release process to use Gradle

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3255?focusedWorklogId=92922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92922
 ]

ASF GitHub Bot logged work on BEAM-3255:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:23
Start Date: 19/Apr/18 23:23
Worklog Time Spent: 10m 
  Work Description: pabloem opened a new pull request #424: [BEAM-3255] 
Updating release guide to use Gradle commands
URL: https://github.com/apache/beam-site/pull/424
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92922)
Time Spent: 3h 10m  (was: 3h)

> Update release process to use Gradle
> 
>
> Key: BEAM-3255
> URL: https://issues.apache.org/jira/browse/BEAM-3255
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Luke Cwik
>Assignee: Alan Myrvold
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This task is about configuring Gradle to generate pom's and artifacts 
> required to perform a release and update the nightly release snapshot Jenkins 
> jobs found here 
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_Release_NightlySnapshot.groovy
> We will also require some integration tests to run against the released 
> nightly snapshot artifacts to ensure that what was built is valid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3249) Use Gradle to build/release project

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3249?focusedWorklogId=92921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92921
 ]

ASF GitHub Bot logged work on BEAM-3249:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:21
Start Date: 19/Apr/18 23:21
Worklog Time Spent: 10m 
  Work Description: swegner commented on issue #5117: [BEAM-3249] Clean-up 
and use shaded test jars, removing evaluationDependsOn
URL: https://github.com/apache/beam/pull/5117#issuecomment-382910451
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92921)
Time Spent: 14h 10m  (was: 14h)

> Use Gradle to build/release project
> ---
>
> Key: BEAM-3249
> URL: https://issues.apache.org/jira/browse/BEAM-3249
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, testing
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> I have collected data by running several builds against master using Gradle 
> and Maven without using Gradle's support for incremental builds.
> Gradle (mins)
> min: 25.04
> max: 160.14
> median: 45.78
> average: 52.19
> stdev: 30.80
> Maven (mins)
> min: 56.86
> max: 216.55
> median: 87.93
> average: 109.10
> stdev: 48.01
> I excluded a few timeouts (240 mins) that happened during the Maven build 
> from its numbers but we can see conclusively that Gradle is about twice as 
> fast for the build when compared to Maven when run using Jenkins.
> Original dev@ thread: 
> https://lists.apache.org/thread.html/225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe253@%3Cdev.beam.apache.org%3E
> The data is available here 
> https://docs.google.com/spreadsheets/d/1MHVjF-xoI49_NJqEQakUgnNIQ7Qbjzu8Y1q_h3dbF1M/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3983) BigQuery writes from pure SQL

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3983?focusedWorklogId=92917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92917
 ]

ASF GitHub Bot logged work on BEAM-3983:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:20
Start Date: 19/Apr/18 23:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #4947: [BEAM-3983] Add 
utils for converting to BigQuery types
URL: https://github.com/apache/beam/pull/4947#issuecomment-382910221
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92917)
Time Spent: 6h 50m  (was: 6h 40m)

> BigQuery writes from pure SQL
> -
>
> Key: BEAM-3983
> URL: https://issues.apache.org/jira/browse/BEAM-3983
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> It would be nice if you could write to BigQuery in SQL without writing any 
> java code. For example:
> {code:java}
> INSERT INTO bigquery SELECT * FROM PCOLLECTION{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3970) Java SDK harness supports window_into

2018-04-19 Thread Eugene Kirpichov (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov reassigned BEAM-3970:
--

Assignee: Eugene Kirpichov  (was: Luke Cwik)

> Java SDK harness supports window_into
> -
>
> Key: BEAM-3970
> URL: https://issues.apache.org/jira/browse/BEAM-3970
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Ben Sidhom
>Assignee: Eugene Kirpichov
>Priority: Major
>
> The Java SDK harness does not currently register a PTransformRunnerFactory 
> for beam:transform:window_into:v1. We need this functionality for GroupByKey 
> transforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92909
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910787
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/pardo.go
 ##
 @@ -95,26 +96,56 @@ func (n *ParDo) ProcessElement(ctx context.Context, elm 
FullValue, values ...ReS
}
 
ctx = metrics.SetPTransformID(ctx, n.PID)
+   fn := n.Fn.ProcessElementFn()
 
-   val, err := n.invokeDataFn(ctx, elm.Timestamp, n.Fn.ProcessElementFn(), 
{Key: elm, Values: values})
-   if err != nil {
-   return n.fail(err)
-   }
+   // If the function observes windows, we must pass invoke the it for 
each window. The expected fast path
 
 Review comment:
   
   s/the it/it/


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92909)
Time Spent: 1h 10m  (was: 1h)

> Go windowing support
> 
>
> Key: BEAM-3303
> URL: https://issues.apache.org/jira/browse/BEAM-3303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add support for Window.into and windowing strategies on Node. Implement the 
> various windowing strategies Beam has: GlobalWindow, FixedWindows, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92914
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910798
 
 

 ##
 File path: sdks/go/pkg/beam/core/funcx/fn_test.go
 ##
 @@ -106,6 +117,12 @@ func TestNew(t *testing.T) {
Fn:   func(context.Context, context.Context, int) {},
Err:  errContextParam,
},
+   {
+   Name: "errWindowParamPrecedence: after EventType",
 
 Review comment:
   
   s/EventType/EventTime


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92914)
Time Spent: 2h  (was: 1h 50m)

> Go windowing support
> 
>
> Key: BEAM-3303
> URL: https://issues.apache.org/jira/browse/BEAM-3303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add support for Window.into and windowing strategies on Node. Implement the 
> various windowing strategies Beam has: GlobalWindow, FixedWindows, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92903
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910777
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/translate.go
 ##
 @@ -68,15 +69,19 @@ func UnmarshalPlan(desc *fnpb.ProcessBundleDescriptor) 
(*Plan, error) {
}
 
if cid == "" {
-   u.Coder, err = b.makeCoderForPCollection(pid)
+   c, wc, err := b.makeCoderForPCollection(pid)
if err != nil {
return nil, err
}
+   u.Coder = coder.NewW(c, wc)
} else {
u.Coder, err = b.coders.Coder(cid) // Expected 
to be windowed coder
if err != nil {
return nil, err
}
+   if !coder.IsW(u.Coder) {
+   return nil, fmt.Errorf("unwindowed 
coder %v on DataSource %v: %v", cid, id, u.Coder)
 
 Review comment:
   
   When this error occurs, what's the fix? Is it indicative of an SDK bug or a 
pipeline author bug or an io author bug?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92903)
Time Spent: 0.5h  (was: 20m)

> Go windowing support
> 
>
> Key: BEAM-3303
> URL: https://issues.apache.org/jira/browse/BEAM-3303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add support for Window.into and windowing strategies on Node. Implement the 
> various windowing strategies Beam has: GlobalWindow, FixedWindows, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92907
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910780
 
 

 ##
 File path: sdks/go/pkg/beam/core/graph/window/windows.go
 ##
 @@ -0,0 +1,76 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package window
+
+import (
+   "fmt"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/typex"
+)
+
+var (
+   // SingleGlobalWindow is a slice of a single global window. Convenience 
value.
+   SingleGlobalWindow = []typex.Window{GlobalWindow{}}
+)
+
+// GlobalWindow represents the singleton, global window.
+type GlobalWindow struct{}
+
+func (GlobalWindow) MaxTimestamp() typex.EventTime {
 
 Review comment:
   
   Top level Exported declarations (even methods to satisfy an interface) 
should have a comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92907)
Time Spent: 1h  (was: 50m)

> Go windowing support
> 
>
> Key: BEAM-3303
> URL: https://issues.apache.org/jira/browse/BEAM-3303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add support for Window.into and windowing strategies on Node. Implement the 
> various windowing strategies Beam has: GlobalWindow, FixedWindows, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92911
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910790
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/combine.go
 ##
 @@ -93,51 +79,38 @@ func (n *Combine) ProcessElement(ctx context.Context, 
value FullValue, values ..
return fmt.Errorf("invalid status for combine %v: %v", n.UID, 
n.status)
}
 
-   if n.IsPerKey {
-   // For per-key combine, all processing can be done here. Note 
that
-   // we do not explicitly call merge, although it may be called 
implicitly
-   // when adding input.
+   // Note that we do not explicitly call merge, although it may
 
 Review comment:
   
   Is this referring to the  non-initial entries in values? We don't appear to 
do anything with them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92911)
Time Spent: 1.5h  (was: 1h 20m)

> Go windowing support
> 
>
> Key: BEAM-3303
> URL: https://issues.apache.org/jira/browse/BEAM-3303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Add support for Window.into and windowing strategies on Node. Implement the 
> various windowing strategies Beam has: GlobalWindow, FixedWindows, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92904
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910779
 
 

 ##
 File path: sdks/go/examples/windowed_wordcount/windowed_wordcount.go
 ##
 @@ -0,0 +1,110 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
 
 Review comment:
   
   Consider adding a package doc explaining the example.
   
   // Windowed_wordcount demonstrates using Windowing with the Go SDK. 
   // Windowing is a concept in the Beam model that subdivides a PCollection 
based on the
   // EventTime of an element.
   // See the Beam Programming guide at 
https://beam.apache.org/documentation/programming-guide/#windowing for more 
information.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92904)
Time Spent: 40m  (was: 0.5h)

> Go windowing support
> 
>
> Key: BEAM-3303
> URL: https://issues.apache.org/jira/browse/BEAM-3303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add support for Window.into and windowing strategies on Node. Implement the 
> various windowing strategies Beam has: GlobalWindow, FixedWindows, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92908=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92908
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910775
 
 

 ##
 File path: sdks/go/pkg/beam/core/graph/mtime/time.go
 ##
 @@ -0,0 +1,123 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Package mtime contains a millisecond representation of time. The purpose
+// of this representation is alignment with the Beam specification, where we
+// need extreme values outside the range of time.Time for windowing.
+package mtime
+
+import (
+   "fmt"
+   "math"
+   "time"
+)
+
+const (
+   // MinTimestamp is the minimum value for any Beam timestamp. Often 
referred to
+   // as "-infinity". This value and MaxTimestamp are chosen so that their
+   // microseconds-since-epoch can be safely represented with an int64 and 
boundary
+   // values can be represented correctly with milli-second precision.
+   MinTimestamp Time = math.MinInt64 / 1000
+
+   // MaxTimestamp is the maximum value for any Beam timestamp. Often 
referred to
+   // as "-infinity".
 
 Review comment:
   
   "infinity"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92908)

> Go windowing support
> 
>
> Key: BEAM-3303
> URL: https://issues.apache.org/jira/browse/BEAM-3303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add support for Window.into and windowing strategies on Node. Implement the 
> various windowing strategies Beam has: GlobalWindow, FixedWindows, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92913
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910794
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/fn.go
 ##
 @@ -46,17 +48,22 @@ func Invoke(ctx context.Context, fn *funcx.Fn, opt 
*MainInput, extra ...interfac
if index, ok := fn.Context(); ok {
args[index] = ctx
}
+   if index, ok := fn.Window(); ok {
+   if len(ws) != 1 {
+   return nil, fmt.Errorf("DoFns that observe windows must 
be invoked with single window: %v", opt.Key.Windows)
 
 Review comment:
   
   When would there be two windows on a PCollection? Simply by applying 
WindowInto twice, the second being a finer grain than the first?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92913)
Time Spent: 1h 50m  (was: 1h 40m)

> Go windowing support
> 
>
> Key: BEAM-3303
> URL: https://issues.apache.org/jira/browse/BEAM-3303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Add support for Window.into and windowing strategies on Node. Implement the 
> various windowing strategies Beam has: GlobalWindow, FixedWindows, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92910
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910789
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/combine.go
 ##
 @@ -93,51 +79,38 @@ func (n *Combine) ProcessElement(ctx context.Context, 
value FullValue, values ..
return fmt.Errorf("invalid status for combine %v: %v", n.UID, 
n.status)
}
 
-   if n.IsPerKey {
-   // For per-key combine, all processing can be done here. Note 
that
-   // we do not explicitly call merge, although it may be called 
implicitly
-   // when adding input.
+   // Note that we do not explicitly call merge, although it may
+   // be called implicitly when adding input.
 
-   a, err := n.newAccum(ctx, value.Elm)
-   if err != nil {
-   return n.fail(err)
-   }
-   first := true
-
-   stream := values[0].Open()
-   for {
-   v, err := stream.Read()
-   if err != nil {
-   if err == io.EOF {
-   break
-   }
-   return n.fail(err)
-   }
+   a, err := n.newAccum(ctx, value.Elm)
+   if err != nil {
+   return n.fail(err)
+   }
+   first := true
 
-   a, err = n.addInput(ctx, a, value.Elm, v.Elm, 
value.Timestamp, first)
-   if err != nil {
-   return n.fail(err)
+   stream := values[0].Open()
+   for {
+   v, err := stream.Read()
+   if err != nil {
+   if err == io.EOF {
+   break
}
-   first = false
+   return n.fail(err)
}
-   stream.Close()
 
-   out, err := n.extract(ctx, a)
+   a, err = n.addInput(ctx, a, value.Elm, v.Elm, value.Timestamp, 
first)
if err != nil {
return n.fail(err)
}
-   return n.Out.ProcessElement(ctx, FullValue{Elm: value.Elm, 
Elm2: out, Timestamp: value.Timestamp})
+   first = false
}
+   stream.Close()
 
 Review comment:
   
   This behaviour was present previously, but consider defering the close after 
the Open call, since there are early returns, the stream may be left open when 
there's an a panic.
   
   That said, if none of the other calls can panic, consider explicitly closing 
before each early return, if the defer cost is a concern.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92910)
Time Spent: 1h 20m  (was: 1h 10m)

> Go windowing support
> 
>
> Key: BEAM-3303
> URL: https://issues.apache.org/jira/browse/BEAM-3303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Add support for Window.into and windowing strategies on Node. Implement the 
> various windowing strategies Beam has: GlobalWindow, FixedWindows, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92912
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910796
 
 

 ##
 File path: sdks/go/pkg/beam/core/graph/coder/time_test.go
 ##
 @@ -18,32 +18,25 @@ package coder
 import (
"bytes"
"testing"
-   "time"
 
-   "github.com/apache/beam/sdks/go/pkg/beam/core/typex"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime"
 )
 
 func TestEncodeDecodeEventTime(t *testing.T) {
tests := []struct {
-   timetime.Time
-   errExpected bool
+   time mtime.Time
}{
-   {time: time.Unix(0, 0)},
-   {time: time.Unix(10, 0)},
-   {time: time.Unix(1257894000, 0)},
-   {time: time.Unix(0, 1257894)},
-   {time: time.Time{}, errExpected: true},
+   {time: mtime.ZeroTimestamp},
+   {time: mtime.MinTimestamp},
+   {time: mtime.MaxTimestamp},
+   {time: mtime.Now()},
 
 Review comment:
   
   While *Now* is admirable to add as a test, it tells you too late that 
there's a problem. Consider Now() +  a month instead, which at least gives 
advanced notice.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92912)
Time Spent: 1h 40m  (was: 1.5h)

> Go windowing support
> 
>
> Key: BEAM-3303
> URL: https://issues.apache.org/jira/browse/BEAM-3303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Add support for Window.into and windowing strategies on Node. Implement the 
> various windowing strategies Beam has: GlobalWindow, FixedWindows, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92905=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92905
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910781
 
 

 ##
 File path: sdks/go/pkg/beam/core/graph/mtime/time.go
 ##
 @@ -0,0 +1,123 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Package mtime contains a millisecond representation of time. The purpose
+// of this representation is alignment with the Beam specification, where we
+// need extreme values outside the range of time.Time for windowing.
+package mtime
+
+import (
+   "fmt"
+   "math"
+   "time"
+)
+
+const (
+   // MinTimestamp is the minimum value for any Beam timestamp. Often 
referred to
+   // as "-infinity". This value and MaxTimestamp are chosen so that their
+   // microseconds-since-epoch can be safely represented with an int64 and 
boundary
+   // values can be represented correctly with milli-second precision.
+   MinTimestamp Time = math.MinInt64 / 1000
+
+   // MaxTimestamp is the maximum value for any Beam timestamp. Often 
referred to
+   // as "-infinity".
+   MaxTimestamp Time = math.MaxInt64 / 1000
+
+   // EndOfGlobalWindowTime is the timestamp at the end of the global 
window. It
+   // is a day before the max timestamp.
+   EndOfGlobalWindowTime = MaxTimestamp - 24*60*60*1000
+
+   // ZeroTimestamp is the default zero value time. It corresponds to the 
unix epoch.
+   ZeroTimestamp Time = 0
+)
+
+// Time is the number of milli-seconds since the Unix epoch. The valid range of
+// times is bounded by what can be represented a _micro_-seconds-since-epoch.
+type Time int64
+
+// Now returns the current time.
+func Now() Time {
+   return FromTime(time.Now())
+}
+
+// FromMilliseconds returns a timestamp from a raw milliseconds-since-epoch 
value.
+func FromMilliseconds(unixMilliseconds int64) Time {
+   return Normalize(Time(unixMilliseconds))
+}
+
+// FromDuration returns a timestamp from a time.Duration-since-epoch value.
+func FromDuration(d time.Duration) Time {
+   return ZeroTimestamp.Add(d)
+}
+
+// FromTime returns a milli-second precision timestamp from a time.Time.
+func FromTime(t time.Time) Time {
+   return Normalize(Time(n2m(t.UnixNano(
+}
+
+// Milliseconds returns the number of milli-seconds since the Unix epoch.
+func (t Time) Milliseconds() int64 {
+   return int64(t)
+}
+
+// Add returns the time plus the duration.
+func (t Time) Add(d time.Duration) Time {
+   return Normalize(Time(int64(t) + n2m(d.Nanoseconds(
+}
+
+// Subtract returns the time minus the duration.
+func (t Time) Subtract(d time.Duration) Time {
+   return Normalize(Time(int64(t) - n2m(d.Nanoseconds(
+}
+
+func (t Time) String() string {
+   switch t {
+   case MinTimestamp:
+   return "-inf"
+   case MaxTimestamp:
+   return "+inf"
+   case EndOfGlobalWindowTime:
+   return "glo"
+   default:
+   return fmt.Sprintf("%v", t.Milliseconds())
+   }
+}
+
+// Min returns the smallest (earliest) time.
+func Min(a, b Time) Time {
+   if int64(a) < int64(b) {
+   return a
+   } else {
+   return b
+   }
+}
+
+// Max returns the largest (latest) time.
+func Max(a, b Time) Time {
+   if int64(a) < int64(b) {
+   return b
+   } else {
+   return a
+   }
+}
+
+// Normalize ensures a Time is within [MinTimestamp,MaxTimestamp].
+func Normalize(t Time) Time {
+   return Min(Max(t, MinTimestamp), MaxTimestamp)
+}
+
+func n2m(v int64) int64 {
 
 Review comment:
   
   Consider:
   // n2m converts nanoseconds to milliseconds.


This is an automated message from 

[jira] [Work logged] (BEAM-3303) Go windowing support

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3303?focusedWorklogId=92906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92906
 ]

ASF GitHub Bot logged work on BEAM-3303:


Author: ASF GitHub Bot
Created on: 19/Apr/18 23:18
Start Date: 19/Apr/18 23:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #5179: 
[BEAM-3303] Add Go Windowing support
URL: https://github.com/apache/beam/pull/5179#discussion_r182910776
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/window.go
 ##
 @@ -0,0 +1,92 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package exec
+
+import (
+   "context"
+   "fmt"
+   "time"
+
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/graph/window"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/typex"
+)
+
+// WindowInto places each element in one or more windows.
+type WindowInto struct {
+   UID UnitID
+   Fn  *window.Fn
+   Out Node
+}
+
+func (w *WindowInto) ID() UnitID {
+   return w.UID
+}
+
+func (w *WindowInto) Up(ctx context.Context) error {
+   return nil
+}
+
+func (w *WindowInto) StartBundle(ctx context.Context, id string, data 
DataManager) error {
+   return w.Out.StartBundle(ctx, id, data)
+}
+
+func (w *WindowInto) ProcessElement(ctx context.Context, elm FullValue, values 
...ReStream) error {
+   windowed := FullValue{
+   Windows:   assignWindows(w.Fn, elm.Timestamp),
+   Timestamp: elm.Timestamp,
+   Elm:   elm.Elm,
+   Elm2:  elm.Elm2,
+   }
+   return w.Out.ProcessElement(ctx, windowed, values...)
+}
+
+func assignWindows(wfn *window.Fn, ts typex.EventTime) []typex.Window {
+   switch wfn.Kind {
+   case window.GlobalWindows:
+   return window.SingleGlobalWindow
+
+   case window.FixedWindows:
+   start := ts - (ts.Add(wfn.Size) % mtime.FromDuration(wfn.Size))
+   end := mtime.Min(start.Add(wfn.Size), 
mtime.EndOfGlobalWindowTime.Add(time.Millisecond))
+   return []typex.Window{window.IntervalWindow{Start: start, End: 
end}}
+
+   case window.SlidingWindows:
+   var ret []typex.Window
+
+   period := mtime.FromDuration(wfn.Period)
+   lastStart := ts - (ts.Add(wfn.Size) % period)
+   for start := lastStart; start > ts.Subtract(wfn.Size); start -= 
period {
+   ret = append(ret, window.IntervalWindow{Start: start, 
End: start.Add(wfn.Size)})
+   }
+   return ret
+
+   default:
 
 Review comment:
   
   Is there a TODO /JIRA for the Session window support?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92906)
Time Spent: 1h  (was: 50m)

> Go windowing support
> 
>
> Key: BEAM-3303
> URL: https://issues.apache.org/jira/browse/BEAM-3303
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Henning Rohde
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add support for Window.into and windowing strategies on Node. Implement the 
> various windowing strategies Beam has: GlobalWindow, FixedWindows, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: beam_PostCommit_Python_Verify #4743

2018-04-19 Thread Apache Jenkins Server
See 


Changes:

[robertwb] Suppress some (fatal) warnings in schemas.

--
[...truncated 1.20 MB...]
"output_name": "out", 
"user_name": "months with tornadoes.out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s1"
}, 
"serialized_fn": "", 
"user_name": "months with tornadoes"
  }
}, 
{
  "kind": "GroupByKey", 
  "name": "s3", 
  "properties": {
"display_data": [], 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": "kind:pair", 
  "component_encodings": [
{
  "@type": 
"VarIntCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxhiUWeeSXOIA5XIYNmYyFjbSFTkh4A89cR+g==",
 
  "component_encodings": []
}, 
{
  "@type": "kind:stream", 
  "component_encodings": [
{
  "@type": 
"VarIntCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxhiUWeeSXOIA5XIYNmYyFjbSFTkh4A89cR+g==",
 
  "component_encodings": []
}
  ], 
  "is_stream_like": true
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": "monthly count/GroupByKey.out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s2"
}, 
"serialized_fn": 
"%0AD%22B%0A%1Dref_Coder_GlobalWindowCoder_1%12%21%0A%1F%0A%1D%0A%1Bbeam%3Acoder%3Aglobal_window%3Av1jT%0A%25%0A%23%0A%21beam%3Awindowfn%3Aglobal_windows%3Av0.1%10%01%1A%1Dref_Coder_GlobalWindowCoder_1%22%02%3A%00%28%010%018%01H%01",
 
"user_name": "monthly count/GroupByKey"
  }
}, 
{
  "kind": "CombineValues", 
  "name": "s4", 
  "properties": {
"display_data": [], 
"encoding": {
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}, 
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}, 
"output_info": [
  {
"encoding": {
  "@type": "kind:windowed_value", 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": [
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}, 
{
  "@type": 
"FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/",
 
  "component_encodings": []
}
  ], 
  "is_pair_like": true
}, 
{
  "@type": "kind:global_window"
}
  ], 
  "is_wrapper": true
}, 
"output_name": "out", 
"user_name": "monthly count/Combine.out"
  }
], 
"parallel_input": {
  "@type": "OutputReference", 
  "output_name": "out", 
  "step_name": "s3"
}, 
"serialized_fn": "", 
"user_name": "monthly count/Combine"
  }
}, 
{
  "kind": "ParallelDo", 
  "name": "s5", 
  "properties": {
"display_data": [
  {
"key": "fn", 
"label": "Transform 

[jira] [Assigned] (BEAM-3883) Python SDK stages artifacts when talking to job server

2018-04-19 Thread Ankur Goenka (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka reassigned BEAM-3883:
--

Assignee: Ankur Goenka  (was: Ahmet Altay)

> Python SDK stages artifacts when talking to job server
> --
>
> Key: BEAM-3883
> URL: https://issues.apache.org/jira/browse/BEAM-3883
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ben Sidhom
>Assignee: Ankur Goenka
>Priority: Major
>
> The Python SDK does not currently stage its user-defined functions or 
> dependencies when talking to the job API. Artifacts that need to be staged 
> include the user code itself, any SDK components not included in the 
> container image, and the list of Python packages that must be installed at 
> runtime.
>  
> Artifacts that are currently expected can be found in the harness boot code: 
> [https://github.com/apache/beam/blob/58e3b06bee7378d2d8db1c8dd534b415864f63e1/sdks/python/container/boot.go#L52.]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4044) Take advantage of Calcite DDL

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4044?focusedWorklogId=92887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92887
 ]

ASF GitHub Bot logged work on BEAM-4044:


Author: ASF GitHub Bot
Created on: 19/Apr/18 22:35
Start Date: 19/Apr/18 22:35
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #5154: [BEAM-4044] [SQL] 
Make BeamCalciteTable self planning
URL: https://github.com/apache/beam/pull/5154#issuecomment-382901428
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92887)
Time Spent: 5h 50m  (was: 5h 40m)

> Take advantage of Calcite DDL
> -
>
> Key: BEAM-4044
> URL: https://issues.apache.org/jira/browse/BEAM-4044
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> In Calcite 1.15 support for abstract DDL moved into calcite core. We should 
> take advantage of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3773?focusedWorklogId=92888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92888
 ]

ASF GitHub Bot logged work on BEAM-3773:


Author: ASF GitHub Bot
Created on: 19/Apr/18 22:35
Start Date: 19/Apr/18 22:35
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #5173: [BEAM-3773][SQL] Add 
EnumerableConverter for JDBC support
URL: https://github.com/apache/beam/pull/5173#issuecomment-382901518
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92888)
Time Spent: 50m  (was: 40m)

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4147) Portable runner Job API abstractions

2018-04-19 Thread Ben Sidhom (JIRA)
Ben Sidhom created BEAM-4147:


 Summary: Portable runner Job API abstractions
 Key: BEAM-4147
 URL: https://issues.apache.org/jira/browse/BEAM-4147
 Project: Beam
  Issue Type: New Feature
  Components: runner-core
Reporter: Ben Sidhom
Assignee: Axel Magnuson


We need a way to wire in arbitrary runner artifact storage backends into the 
job server and through to artifact staging on workers. This requires some new 
abstractions in front of the job service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3714) JdbcIO.read() should create a forward-only, read-only result set

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3714?focusedWorklogId=92880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92880
 ]

ASF GitHub Bot logged work on BEAM-3714:


Author: ASF GitHub Bot
Created on: 19/Apr/18 22:21
Start Date: 19/Apr/18 22:21
Worklog Time Spent: 10m 
  Work Description: jkff commented on issue #5109: [BEAM-3714]modified 
result set to be forward only and read only
URL: https://github.com/apache/beam/pull/5109#issuecomment-382898723
 
 
   The Go precommit is a flake, but in Java there seems to be a couple of final 
simple checkstyle errors (line too long) remaining.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92880)
Time Spent: 3h 20m  (was: 3h 10m)

> JdbcIO.read() should create a forward-only, read-only result set
> 
>
> Key: BEAM-3714
> URL: https://issues.apache.org/jira/browse/BEAM-3714
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jdbc
>Reporter: Eugene Kirpichov
>Assignee: Innocent
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> [https://stackoverflow.com/questions/48784889/streaming-data-from-cloudsql-into-dataflow/48819934#48819934]
>  - a user is trying to load a large table from MySQL, and the MySQL JDBC 
> driver requires special measures when loading large result sets.
> JdbcIO currently calls simply "connection.prepareStatement(query)" 
> https://github.com/apache/beam/blob/bb8c12c4956cbe3c6f2e57113e7c0ce2a5c05009/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L508
>  - it should specify type TYPE_FORWARD_ONLY and concurrency CONCUR_READ_ONLY 
> - these values should always be used.
> Seems that different databases have different requirements for streaming 
> result sets.
> E.g. MySQL requires setting fetch size; PostgreSQL says "The Connection must 
> not be in autocommit mode." 
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor . 
> Oracle, I think, doesn't have any special requirements but I don't know. 
> Fetch size should probably still be set to a reasonably large value.
> Seems that the common denominator of these requirements is: set fetch size to 
> a reasonably large but not maximum value; disable autocommit (there's nothing 
> to commit in read() anyway).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3714) JdbcIO.read() should create a forward-only, read-only result set

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3714?focusedWorklogId=92884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92884
 ]

ASF GitHub Bot logged work on BEAM-3714:


Author: ASF GitHub Bot
Created on: 19/Apr/18 22:21
Start Date: 19/Apr/18 22:21
Worklog Time Spent: 10m 
  Work Description: jkff commented on issue #5109: [BEAM-3714]modified 
result set to be forward only and read only
URL: https://github.com/apache/beam/pull/5109#issuecomment-382898800
 
 
   You can reproduce by running ./gradlew :beam-sdks-java-io-jdbc:checkstyleMain


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92884)
Time Spent: 3.5h  (was: 3h 20m)

> JdbcIO.read() should create a forward-only, read-only result set
> 
>
> Key: BEAM-3714
> URL: https://issues.apache.org/jira/browse/BEAM-3714
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jdbc
>Reporter: Eugene Kirpichov
>Assignee: Innocent
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> [https://stackoverflow.com/questions/48784889/streaming-data-from-cloudsql-into-dataflow/48819934#48819934]
>  - a user is trying to load a large table from MySQL, and the MySQL JDBC 
> driver requires special measures when loading large result sets.
> JdbcIO currently calls simply "connection.prepareStatement(query)" 
> https://github.com/apache/beam/blob/bb8c12c4956cbe3c6f2e57113e7c0ce2a5c05009/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L508
>  - it should specify type TYPE_FORWARD_ONLY and concurrency CONCUR_READ_ONLY 
> - these values should always be used.
> Seems that different databases have different requirements for streaming 
> result sets.
> E.g. MySQL requires setting fetch size; PostgreSQL says "The Connection must 
> not be in autocommit mode." 
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor . 
> Oracle, I think, doesn't have any special requirements but I don't know. 
> Fetch size should probably still be set to a reasonably large value.
> Seems that the common denominator of these requirements is: set fetch size to 
> a reasonably large but not maximum value; disable autocommit (there's nothing 
> to commit in read() anyway).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4038) Support Kafka Headers in KafkaIO

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4038?focusedWorklogId=92881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92881
 ]

ASF GitHub Bot logged work on BEAM-4038:


Author: ASF GitHub Bot
Created on: 19/Apr/18 22:21
Start Date: 19/Apr/18 22:21
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5111: 
[BEAM-4038] Support Kafka Headers in KafkaIO
URL: https://github.com/apache/beam/pull/5111#discussion_r182899943
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaRecordCoder.java
 ##
 @@ -66,9 +76,34 @@ public void encode(KafkaRecord value, OutputStream 
outStream) throws IOExc
 longCoder.decode(inStream),
 longCoder.decode(inStream),
 KafkaTimestampType.forOrdinal(intCoder.decode(inStream)),
+(Headers) toHeaders(headerCoder.decode(inStream)),
 kvCoder.decode(inStream));
   }
 
+  private Object toHeaders(Iterable> records) {
+if (!records.iterator().hasNext()){
+  return null;
+}
+
+// ConsumerRecord is used to simply create a list of headers
+ConsumerRecord consumerRecord = new ConsumerRecord<>(
 
 Review comment:
   Change this to 
   ```
   Headers headers = ... 
   records.forEach(... headers.add())l
   ```
   Also use empty strings rather than "key", "value" etc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92881)
Time Spent: 7h 10m  (was: 7h)

> Support Kafka Headers in KafkaIO
> 
>
> Key: BEAM-4038
> URL: https://issues.apache.org/jira/browse/BEAM-4038
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Geet Kumar
>Assignee: Raghu Angadi
>Priority: Minor
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Headers have been added to Kafka Consumer/Producer records (KAFKA-4208). The 
> purpose of this JIRA is to support this feature in KafkaIO.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4038) Support Kafka Headers in KafkaIO

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4038?focusedWorklogId=92882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92882
 ]

ASF GitHub Bot logged work on BEAM-4038:


Author: ASF GitHub Bot
Created on: 19/Apr/18 22:21
Start Date: 19/Apr/18 22:21
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5111: 
[BEAM-4038] Support Kafka Headers in KafkaIO
URL: https://github.com/apache/beam/pull/5111#discussion_r182901167
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaRecordCoder.java
 ##
 @@ -66,9 +76,34 @@ public void encode(KafkaRecord value, OutputStream 
outStream) throws IOExc
 longCoder.decode(inStream),
 longCoder.decode(inStream),
 KafkaTimestampType.forOrdinal(intCoder.decode(inStream)),
+(Headers) toHeaders(headerCoder.decode(inStream)),
 kvCoder.decode(inStream));
   }
 
+  private Object toHeaders(Iterable> records) {
+if (!records.iterator().hasNext()){
+  return null;
 
 Review comment:
   When headers are supported, this should be empty iterable of headers (to 
match ConsumerRecord and to avoid returning nulls to user).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92882)
Time Spent: 7h 20m  (was: 7h 10m)

> Support Kafka Headers in KafkaIO
> 
>
> Key: BEAM-4038
> URL: https://issues.apache.org/jira/browse/BEAM-4038
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Geet Kumar
>Assignee: Raghu Angadi
>Priority: Minor
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Headers have been added to Kafka Consumer/Producer records (KAFKA-4208). The 
> purpose of this JIRA is to support this feature in KafkaIO.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4038) Support Kafka Headers in KafkaIO

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4038?focusedWorklogId=92883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92883
 ]

ASF GitHub Bot logged work on BEAM-4038:


Author: ASF GitHub Bot
Created on: 19/Apr/18 22:21
Start Date: 19/Apr/18 22:21
Worklog Time Spent: 10m 
  Work Description: rangadi commented on a change in pull request #5111: 
[BEAM-4038] Support Kafka Headers in KafkaIO
URL: https://github.com/apache/beam/pull/5111#discussion_r182899063
 
 

 ##
 File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java
 ##
 @@ -139,4 +156,11 @@ public long offsetForTime(Consumer consumer, 
TopicPartition topicPartition
   return offsetAndTimestamp.offset();
 }
   }
+
+  public Headers getHeaders(ConsumerRecord rawRecord) {
 
 Review comment:
   Since hasHeaders is checked directly, this method can be removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92883)
Time Spent: 7.5h  (was: 7h 20m)

> Support Kafka Headers in KafkaIO
> 
>
> Key: BEAM-4038
> URL: https://issues.apache.org/jira/browse/BEAM-4038
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Geet Kumar
>Assignee: Raghu Angadi
>Priority: Minor
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Headers have been added to Kafka Consumer/Producer records (KAFKA-4208). The 
> purpose of this JIRA is to support this feature in KafkaIO.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3697) Add errorprone to gradle builds

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3697?focusedWorklogId=92879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92879
 ]

ASF GitHub Bot logged work on BEAM-3697:


Author: ASF GitHub Bot
Created on: 19/Apr/18 22:11
Start Date: 19/Apr/18 22:11
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5161: [BEAM-3697] Add 
errorprone to Gradle build
URL: https://github.com/apache/beam/pull/5161#issuecomment-382896461
 
 
   run java precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92879)
Time Spent: 1h 10m  (was: 1h)

> Add errorprone to gradle builds
> ---
>
> Key: BEAM-3697
> URL: https://issues.apache.org/jira/browse/BEAM-3697
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [http://errorprone.info/] is a good static checker that covers a number of 
> bugs not covered by FindBugs or Checkstyle. We use it internally at Google 
> and, when run on the Beam codebase, it occasionally uncovers issues missed 
> during PR review process.
>  
> It has Maven and Gradle plugins:
> [http://errorprone.info/docs/installation]
> [https://github.com/tbroyer/gradle-errorprone-plugin]
>  
> It would be good to integrate it into our Maven and Gradle builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4146) Python SDK sets environment in portable pipelines

2018-04-19 Thread Ben Sidhom (JIRA)
Ben Sidhom created BEAM-4146:


 Summary: Python SDK sets environment in portable pipelines
 Key: BEAM-4146
 URL: https://issues.apache.org/jira/browse/BEAM-4146
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ben Sidhom
Assignee: Ahmet Altay


Environments must be set in any non-runner-executed transforms. See 
[https://github.com/bsidhom/beam/commit/0362fd1f25] for a possible approach 
until canonical image urls are created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   >