[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139131&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139131
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 29/Aug/18 07:07
Start Date: 29/Aug/18 07:07
Worklog Time Spent: 10m 
  Work Description: JozoVilcek commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r213566235
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -171,4 +176,12 @@ public static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 
 return flinkStreamEnv;
   }
+
+  private static void applyLatencyTrackingInterval(
+  ExecutionConfig config, FlinkPipelineOptions options) {
+long latencyTrackingInterval = options.getLatencyTrackingInterval();
+if (latencyTrackingInterval != -1) {
 
 Review comment:
   latencyTrackingINteraval = 0 will disable the feature. This is what I need 
(reason is `FLINK-10226`).
   The idea is (taken from other options above), is user specify something, 
forward it to flink. Reasonable values are `interval >= 0` but flink accepts 
negative numbers and threat them as `0` = disable.
   I can do `latencyTrackingInterval > 0` if you consider it more fit. Should I?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139131)
Time Spent: 40m  (was: 0.5h)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139134&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139134
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 29/Aug/18 07:09
Start Date: 29/Aug/18 07:09
Worklog Time Spent: 10m 
  Work Description: JozoVilcek commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r213566671
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
 ##
 @@ -167,4 +167,11 @@
   Boolean isShutdownSourcesOnFinalWatermark();
 
   void setShutdownSourcesOnFinalWatermark(Boolean shutdownOnFinalWatermark);
+
+  @Description(
+  "Interval in milliseconds for sending latency tracking marks from the 
sources to the sinks.")
+  @Default.Long(-1L)
+  Long getLatencyTrackingInterval();
 
 Review comment:
   I do not follow. I do not think it is debug option. More to allow/configure 
latency monitoring.
   For reference: 
   
https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java#L244
   
   Where do you suggest it to move? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139134)
Time Spent: 1h  (was: 50m)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139133
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 29/Aug/18 07:09
Start Date: 29/Aug/18 07:09
Worklog Time Spent: 10m 
  Work Description: JozoVilcek commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r213566671
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
 ##
 @@ -167,4 +167,11 @@
   Boolean isShutdownSourcesOnFinalWatermark();
 
   void setShutdownSourcesOnFinalWatermark(Boolean shutdownOnFinalWatermark);
+
+  @Description(
+  "Interval in milliseconds for sending latency tracking marks from the 
sources to the sinks.")
+  @Default.Long(-1L)
+  Long getLatencyTrackingInterval();
 
 Review comment:
   I do not follow. I do not think it is debug option. 
   For reference: 
   
https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java#L244
   
   Where do you suggest it to move? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139133)
Time Spent: 50m  (was: 40m)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=139136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139136
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 29/Aug/18 07:10
Start Date: 29/Aug/18 07:10
Worklog Time Spent: 10m 
  Work Description: timrobertson100 edited a comment on issue #6289: 
[BEAM-5036] Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-416716699
 
 
   I should add I opted to ask you to review as I think this needs 
consideration by someone with experience in the FileSystem implementations. I'm 
happy if we want to wait until after 2.7.0 is cut, so we have the full 6 week 
of testing before 2.8.0 to flush out any issues. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139136)
Time Spent: 1h 10m  (was: 1h)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139144&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139144
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 29/Aug/18 07:30
Start Date: 29/Aug/18 07:30
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r213568702
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -171,4 +176,12 @@ public static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 
 return flinkStreamEnv;
   }
+
+  private static void applyLatencyTrackingInterval(
+  ExecutionConfig config, FlinkPipelineOptions options) {
+long latencyTrackingInterval = options.getLatencyTrackingInterval();
+if (latencyTrackingInterval != -1) {
 
 Review comment:
   > 0 will be more explicit so lets go with that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139144)
Time Spent: 1h 10m  (was: 1h)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139145&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139145
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 29/Aug/18 07:30
Start Date: 29/Aug/18 07:30
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r213571798
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
 ##
 @@ -167,4 +167,11 @@
   Boolean isShutdownSourcesOnFinalWatermark();
 
   void setShutdownSourcesOnFinalWatermark(Boolean shutdownOnFinalWatermark);
+
+  @Description(
+  "Interval in milliseconds for sending latency tracking marks from the 
sources to the sinks.")
+  @Default.Long(-1L)
+  Long getLatencyTrackingInterval();
 
 Review comment:
   I see, Sounds reasonable. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139145)
Time Spent: 1h 10m  (was: 1h)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5062) Add ability to configure S3ClientOptions

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5062?focusedWorklogId=139151&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139151
 ]

ASF GitHub Bot logged work on BEAM-5062:


Author: ASF GitHub Bot
Created on: 29/Aug/18 07:48
Start Date: 29/Aug/18 07:48
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #6122: [BEAM-5062] Add 
ability to provide custom S3ClientOptions
URL: https://github.com/apache/beam/pull/6122#issuecomment-416858720
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139151)
Time Spent: 2.5h  (was: 2h 20m)

> Add ability to configure S3ClientOptions
> 
>
> Key: BEAM-5062
> URL: https://issues.apache.org/jira/browse/BEAM-5062
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> It would be very useful to have an ability to configure 
> [S3ClientOptions|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/S3ClientOptions.html]
>  for Apache Beam jobs.
> For example, there are some implementations of S3, that does not support 
> virtual-hosted-style URLs for buckets, only path-style. Currently it's 
> impossible to enable path style access for amazon s3 client, which is used by 
> an apache-beam job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=139153&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139153
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 29/Aug/18 07:53
Start Date: 29/Aug/18 07:53
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#discussion_r213577667
 
 

 ##
 File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
 ##
 @@ -125,7 +125,8 @@
  *
  * Optionally, you can provide {@link 
ElasticsearchIO.Write.FieldValueExtractFn} using {@code
  * withIndexFn()} or {@code withTypeFn()} to enable per-document routing to 
the target Elasticsearch
- * index and type.
+ * index (all versions) and type (version <6). Support for type routing was 
removed in Elasticsearch
+ * 6 (see 
https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch)
 
 Review comment:
   What is the status of that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139153)
Time Spent: 8h  (was: 7h 50m)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139159&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139159
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:15
Start Date: 29/Aug/18 08:15
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213584107
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -234,7 +125,7 @@ protected ServerFactory getServerFactory() {
 // We only use the published Docker ports 8100-8200 in a 
round-robin fashion
 () -> MAC_PORT.getAndUpdate(val -> val == MAC_PORT_END ? 
MAC_PORT_START : val + 1));
   default:
-LOG.warn("Unknown Docker platform. Falling back to default server 
factory");
+logger.warn("Unknown Docker platform. Falling back to default server 
factory");
 
 Review comment:
   The logger is now non-static and only in the base class, that's why the 
variable name is lower case. Can create static loggers in both classes if you 
prefer that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139159)
Time Spent: 1.5h  (was: 1h 20m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139160&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139160
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:15
Start Date: 29/Aug/18 08:15
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213584144
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/JobBundleFactoryBase.java
 ##
 @@ -0,0 +1,331 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.fn.stream.OutboundObserverFactory;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public abstract class JobBundleFactoryBase implements JobBundleFactory {
+  protected final Logger logger = LoggerFactory.getLogger(getClass());
 
 Review comment:
   The logger is now non-static and only in the base class, that's why the 
variable name is lower case. Can create static loggers in both classes if you 
prefer that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please con

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139161&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139161
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:19
Start Date: 29/Aug/18 08:19
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213585481
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactory.java
 ##
 @@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import com.google.common.collect.ImmutableList;
+import java.time.Duration;
+import java.util.List;
+import java.util.concurrent.TimeoutException;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientPool;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * An {@link EnvironmentFactory} which forks processes based on the given URL 
in the Environment.
+ * The returned {@link ProcessEnvironment} has to make sure to stop the 
processes.
+ */
+public class ProcessEnvironmentFactory implements EnvironmentFactory {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ProcessEnvironmentFactory.class);
+
+  public static ProcessEnvironmentFactory create(
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  IdGenerator idGenerator) {
+return create(
+ProcessManager.getDefault(),
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+clientSource,
+idGenerator);
+  }
+
+  static ProcessEnvironmentFactory create(
+  ProcessManager processManager,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  IdGenerator idGenerator) {
+return new ProcessEnvironmentFactory(
+processManager,
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+idGenerator,
+clientSource);
+  }
+
+  private final ProcessManager processManager;
+  private final GrpcFnServer 
controlServiceServer;
+  private final GrpcFnServer loggingServiceServer;
+  private final GrpcFnServer retrievalServiceServer;
+  private final GrpcFnServer 
provisioningServiceServer;
+  private final IdGenerator idGenerator;
+  private final ControlClientPool.Source clientSource;
+
+  private ProcessEnvironmentFactory(
+  ProcessManager processManager,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  IdGenerator idGenerator,
+  ControlClientPool.Source clientSource) {
+this.processManager = processManager;
+this.controlServiceServer = controlServiceServer;
+this.loggingServiceServer = loggingServiceServer;
+this.retrievalServiceServer = retrievalServiceServer;
+this.provisioningServiceServer = provisioningServiceServer

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139162&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139162
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:20
Start Date: 29/Aug/18 08:20
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213585789
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -234,7 +125,7 @@ protected ServerFactory getServerFactory() {
 // We only use the published Docker ports 8100-8200 in a 
round-robin fashion
 () -> MAC_PORT.getAndUpdate(val -> val == MAC_PORT_END ? 
MAC_PORT_START : val + 1));
   default:
-LOG.warn("Unknown Docker platform. Falling back to default server 
factory");
+logger.warn("Unknown Docker platform. Falling back to default server 
factory");
 
 Review comment:
   Thomas also commented on this, will revert this to static loggers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139162)
Time Spent: 2h  (was: 1h 50m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5255) Fix over-aggressive division futurization.

2018-08-29 Thread Robert Bradshaw (JIRA)
Robert Bradshaw created BEAM-5255:
-

 Summary: Fix over-aggressive division futurization.
 Key: BEAM-5255
 URL: https://issues.apache.org/jira/browse/BEAM-5255
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Affects Versions: 2.6.0
Reporter: Robert Bradshaw
Assignee: Ahmet Altay


When converting from Python 2 to Python 3, `a / b` becomes `a // b` only for 
ints, but it is incorrect to do this substitution for floating point division. 

 

I noticed this change in the microbenchmarks, but we should do an audit to make 
sure we haven't broken things elsewhere. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139163&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139163
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:20
Start Date: 29/Aug/18 08:20
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213585817
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/JobBundleFactoryBase.java
 ##
 @@ -0,0 +1,331 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.fn.stream.OutboundObserverFactory;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public abstract class JobBundleFactoryBase implements JobBundleFactory {
+  protected final Logger logger = LoggerFactory.getLogger(getClass());
 
 Review comment:
   Thomas also commented on this, will revert this to static loggers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Work

[jira] [Work logged] (BEAM-5255) Fix over-aggressive division futurization.

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5255?focusedWorklogId=139166&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139166
 ]

ASF GitHub Bot logged work on BEAM-5255:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:23
Start Date: 29/Aug/18 08:23
Worklog Time Spent: 10m 
  Work Description: robertwb opened a new pull request #6293: [BEAM-5255] 
Fix over-aggressive division futurization in benchmarks.
URL: https://github.com/apache/beam/pull/6293
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139166)
Time Spent: 10m
Remaining Estimate: 0h

> Fix over-aggressive division futurization.
> --
>
> Key: BEAM-5255
> URL: https://issues.apache.org/jira/browse/BEAM-5255
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.6.0
>Reporter: Robert Bradshaw
>Assignee: Ahmet Altay
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When converting from Python 2 to Python 3, `a / b` becomes `a // b` only for 
> ints, but it is incorrect to do this substi

[jira] [Work logged] (BEAM-5255) Fix over-aggressive division futurization.

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5255?focusedWorklogId=139167&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139167
 ]

ASF GitHub Bot logged work on BEAM-5255:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:24
Start Date: 29/Aug/18 08:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6293: [BEAM-5255] Fix 
over-aggressive division futurization in benchmarks.
URL: https://github.com/apache/beam/pull/6293#issuecomment-416868865
 
 
   R: @tvalentyn 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139167)
Time Spent: 20m  (was: 10m)

> Fix over-aggressive division futurization.
> --
>
> Key: BEAM-5255
> URL: https://issues.apache.org/jira/browse/BEAM-5255
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.6.0
>Reporter: Robert Bradshaw
>Assignee: Ahmet Altay
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When converting from Python 2 to Python 3, `a / b` becomes `a // b` only for 
> ints, but it is incorrect to do this substitution for floating point 
> division. 
>  
> I noticed this change in the microbenchmarks, but we should do an audit to 
> make sure we haven't broken things elsewhere. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5248) Euphoria to Beam translators should set coders to output `PCollections'

2018-08-29 Thread Vaclav Plajt (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaclav Plajt updated BEAM-5248:
---
Summary: Euphoria to Beam translators should set coders to output 
`PCollections'  (was: Euphoria to Beam translators should set coders to output 
`PCollections' rather than to input ones)

> Euphoria to Beam translators should set coders to output `PCollections'
> ---
>
> Key: BEAM-5248
> URL: https://issues.apache.org/jira/browse/BEAM-5248
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
>
> Euphoria to Beam translators sets coders to new `PCollections` automatically. 
> Every new `PCollection` gets coder, some translators event sets coders event 
> to  input `PCollection`'s. That is a trouble when any `Dataset`/`PCollection` 
> is used as input to two or more operators. Set coders to all output 
> `Pcollections` but not for input ones.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5248) Euphoria to Beam translators should set coders to output `PCollections' rather than to input ones

2018-08-29 Thread Vaclav Plajt (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaclav Plajt updated BEAM-5248:
---
Description: 
Euphoria to Beam translators sets coders to new `PCollections` automatically. 
Every new `PCollection` gets coder, some translators event sets coders event to 
 input `PCollection`'s. That is a trouble when any `Dataset`/`PCollection` is 
used as input to two or more operators. Set coders to all output `Pcollections` 
but not for input ones.

 

  was:
Euphoria to Beam translators sets coders to new `PCollections` automatically. 
Every new `PCollection` gets coder, event input `PCollection`'s coders are set. 
That is a trouble when any `Dataset`/`PCollection` is used as input to two or 
more operators. Set coders for output `Pcollections` but not for output ones.

 


> Euphoria to Beam translators should set coders to output `PCollections' 
> rather than to input ones
> -
>
> Key: BEAM-5248
> URL: https://issues.apache.org/jira/browse/BEAM-5248
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
>
> Euphoria to Beam translators sets coders to new `PCollections` automatically. 
> Every new `PCollection` gets coder, some translators event sets coders event 
> to  input `PCollection`'s. That is a trouble when any `Dataset`/`PCollection` 
> is used as input to two or more operators. Set coders to all output 
> `Pcollections` but not for input ones.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated (9d5c044 -> 053b1d8)

2018-08-29 Thread mxm
This is an automated email from the ASF dual-hosted git repository.

mxm pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 9d5c044  Fix typo.
 add 9839d15  [BEAM-5172] Force closing the index before deleting it.
 add 2766530  [BEAM-5172] Wait for HTTP connection to return 200 before 
starting the tests. Do not wait for ES green status because ES client already 
does retry.
 add 276b5be  [BEAM-5172] Add thread id to beam test index name to avoid 
collision in parallel tests
 new 053b1d8  Merge pull request #6279 from 
echauchot/BEAM-5172-flaky-es2-tests

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../sdk/io/elasticsearch/ElasticsearchIOTest.java| 20 ++--
 .../sdk/io/elasticsearch/ElasticsearchIOTest.java| 14 +++---
 .../io/elasticsearch/ElasticSearchIOTestUtils.java   |  5 +
 .../io/elasticsearch/ElasticsearchIOITCommon.java|  8 
 .../io/elasticsearch/ElasticsearchIOTestCommon.java  |  5 -
 5 files changed, 38 insertions(+), 14 deletions(-)



[beam] 01/01: Merge pull request #6279 from echauchot/BEAM-5172-flaky-es2-tests

2018-08-29 Thread mxm
This is an automated email from the ASF dual-hosted git repository.

mxm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit 053b1d8d82f4614ba5ad2691450b25f1654faeda
Merge: 9d5c044 276b5be
Author: Maximilian Michels 
AuthorDate: Wed Aug 29 10:36:27 2018 +0200

Merge pull request #6279 from echauchot/BEAM-5172-flaky-es2-tests

 .../sdk/io/elasticsearch/ElasticsearchIOTest.java| 20 ++--
 .../sdk/io/elasticsearch/ElasticsearchIOTest.java| 14 +++---
 .../io/elasticsearch/ElasticSearchIOTestUtils.java   |  5 +
 .../io/elasticsearch/ElasticsearchIOITCommon.java|  8 
 .../io/elasticsearch/ElasticsearchIOTestCommon.java  |  5 -
 5 files changed, 38 insertions(+), 14 deletions(-)



[jira] [Work logged] (BEAM-5172) org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5172?focusedWorklogId=139172&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139172
 ]

ASF GitHub Bot logged work on BEAM-5172:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:39
Start Date: 29/Aug/18 08:39
Worklog Time Spent: 10m 
  Work Description: asfgit closed pull request #6279:  [BEAM-5172] Fix 
Elasticsearch UTests flakiness
URL: https://github.com/apache/beam/pull/6279
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java
 
b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java
index 862ba5b4292..9920dde53ba 100644
--- 
a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java
+++ 
b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java
@@ -21,9 +21,9 @@
 import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration;
 import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.Read;
 import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.ACCEPTABLE_EMPTY_SPLITS_PERCENTAGE;
-import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.ES_INDEX;
 import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.ES_TYPE;
 import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.NUM_DOCS_UTESTS;
+import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.getEsIndex;
 import static org.apache.beam.sdk.testing.SourceTestUtils.readFromSource;
 import static org.hamcrest.Matchers.lessThan;
 import static org.junit.Assert.assertEquals;
@@ -61,6 +61,7 @@
   private static final Logger LOG = 
LoggerFactory.getLogger(ElasticsearchIOTest.class);
 
   private static final String ES_IP = "127.0.0.1";
+  private static final int MAX_STARTUP_WAITING_TIME_MSEC = 5000;
 
   private static Node node;
   private static RestClient restClient;
@@ -97,10 +98,25 @@ public static void beforeClass() throws IOException {
 node.start();
 connectionConfiguration =
 ConnectionConfiguration.create(
-new String[] {"http://"; + ES_IP + ":" + esHttpPort}, ES_INDEX, 
ES_TYPE);
+new String[] {"http://"; + ES_IP + ":" + esHttpPort}, getEsIndex(), 
ES_TYPE);
 restClient = connectionConfiguration.createClient();
 elasticsearchIOTestCommon =
 new ElasticsearchIOTestCommon(connectionConfiguration, restClient, 
false);
+int waitingTime = 0;
+int healthCheckFrequency = 500;
+while ((waitingTime < MAX_STARTUP_WAITING_TIME_MSEC)
+&& restClient.performRequest("HEAD", 
"/").getStatusLine().getStatusCode() != 200) {
+  try {
+Thread.sleep(healthCheckFrequency);
+waitingTime += healthCheckFrequency;
+  } catch (InterruptedException e) {
+LOG.warn(
+"Waiting thread was interrupted while waiting for connection to 
Elasticsearch to be available");
+  }
+}
+if (waitingTime >= MAX_STARTUP_WAITING_TIME_MSEC) {
+  throw new IOException("Max startup waiting for embedded Elasticsearch to 
start was exceeded");
+}
   }
 
   @AfterClass
diff --git 
a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java
 
b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java
index d2791c76d1f..b453b9f4740 100644
--- 
a/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java
+++ 
b/sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java
@@ -21,12 +21,11 @@
 import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.ConnectionConfiguration;
 import static org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.Read;
 import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.ACCEPTABLE_EMPTY_SPLITS_PERCENTAGE;
-import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.ES_INDEX;
 import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.ES_TYPE;
 import static 
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.NUM_DOCS_UTESTS;
+import static 
org.apache.beam.sdk.io.elasticsearch.

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139177&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139177
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:47
Start Date: 29/Aug/18 08:47
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213594055
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/JobBundleFactoryBase.java
 ##
 @@ -0,0 +1,331 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.fn.stream.OutboundObserverFactory;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public abstract class JobBundleFactoryBase implements JobBundleFactory {
+  protected final Logger logger = LoggerFactory.getLogger(getClass());
 
 Review comment:
   That's already the case because `getClass()` always returns the class name 
of the top-level class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Track

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139179
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:50
Start Date: 29/Aug/18 08:50
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213585789
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -234,7 +125,7 @@ protected ServerFactory getServerFactory() {
 // We only use the published Docker ports 8100-8200 in a 
round-robin fashion
 () -> MAC_PORT.getAndUpdate(val -> val == MAC_PORT_END ? 
MAC_PORT_START : val + 1));
   default:
-LOG.warn("Unknown Docker platform. Falling back to default server 
factory");
+logger.warn("Unknown Docker platform. Falling back to default server 
factory");
 
 Review comment:
   edit: Would like to keep it because it will otherwise introduce multiple 
logger instances.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139179)
Time Spent: 2.5h  (was: 2h 20m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139180&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139180
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:50
Start Date: 29/Aug/18 08:50
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213585817
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/JobBundleFactoryBase.java
 ##
 @@ -0,0 +1,331 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.fn.stream.OutboundObserverFactory;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public abstract class JobBundleFactoryBase implements JobBundleFactory {
+  protected final Logger logger = LoggerFactory.getLogger(getClass());
 
 Review comment:
   edit: Would like to keep it because it will otherwise introduce multiple 
logger instances.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking

[jira] [Work logged] (BEAM-5245) Missing windowing error message is not undestandable to Euphoria user

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5245?focusedWorklogId=139181&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139181
 ]

ASF GitHub Bot logged work on BEAM-5245:


Author: ASF GitHub Bot
Created on: 29/Aug/18 08:51
Start Date: 29/Aug/18 08:51
Worklog Time Spent: 10m 
  Work Description: VaclavPlajt opened a new pull request #6294: 
[BEAM-5245] Better missing windowing error reporting
URL: https://github.com/apache/beam/pull/6294
 
 
   Euphoria API now reports missing windowing.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139181)
Time Spent: 10m
Remaining Estimate: 0h

> Missing windowing error message is not undestandable to Euphoria user
> -
>
> Key: BEAM-5245
> URL: https://issues.apache.org/jira/browse/BEAM-5245
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> RBK a JOIN cannot wotk w

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139187&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139187
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 09:11
Start Date: 29/Aug/18 09:11
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-416882723
 
 
   Thanks for checking it out @angoenka @tweise.
   
   I've checked out your code @tweise.
   
   >From cursory test it seems error handling in the process launch needs some 
fixing. 
   
   The code actually fails if the binary path is not correct. It doesn't fail 
immediately when the process fails but it will time out. I'll introduce a check 
for the exit code when waiting for the connection from the SDK harness process.

   >We also need the ability to propagate the environment of the JVM plus set 
additional environment variables to meet the contract of the Python worker.
   
   As things stand now, setting the correct environment variables is the 
responsibility of your startup script. But you're right, it doesn't make sense 
to pass the container contract parameters because the SDK harness processes 
rely on environment variables. Since we launch the process directly, we can set 
the required environment variables for the SDK harness. 
   
   What do you want to achieve by propagating the environment of the JVM?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139187)
Time Spent: 2h 50m  (was: 2h 40m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4750) Beam performance degraded significantly since 2.4

2018-08-29 Thread Vojtech Janota (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596127#comment-16596127
 ] 

Vojtech Janota commented on BEAM-4750:
--

Guys, am I right to assume that the fix didn't make it into 2.6.0 and it should 
be added into Affects versions field?

> Beam performance degraded significantly since 2.4
> -
>
> Key: BEAM-4750
> URL: https://issues.apache.org/jira/browse/BEAM-4750
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.4.0, 2.5.0
>Reporter: Vojtech Janota
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.7.0
>
>
> Starting from Beam 2.4 onwards the *InMemoryStateInternals* class in the 
> *beam-runners-core-java* module does an expensive Coder encode/decode 
> operation when copying object state. This has significant impact on 
> performance and pipelines that previously took low minutes do not finish 
> within hours in our case. Based on the discussion on the dev mailing list, 
> the main motivation for this change was to enforce Coder sanity, something 
> that should arguably remain within the realm of the DirectRunner and should 
> not leak into the core layer.
> Links to commits that introduced the new behaviour:
>  * [https://github.com/apache/beam/commit/32a427c]
>  * [https://github.com/apache/beam/commit/8151d82]
>  
> Additional details and surrounding discussion can be found here:
>  * 
> https://lists.apache.org/thread.html/1e329318a4dafe27b9ff304d9460d05d8966e5ceebaf4ebfb948e2b8@%3Cdev.beam.apache.org%3E
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5107) Support ES 6.x for ElasticsearchIO

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5107?focusedWorklogId=139192&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139192
 ]

ASF GitHub Bot logged work on BEAM-5107:


Author: ASF GitHub Bot
Created on: 29/Aug/18 09:40
Start Date: 29/Aug/18 09:40
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6211: 
[BEAM-5107] Support ES-6.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/6211#discussion_r213607551
 
 

 ##
 File path: 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-5/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOIT.java
 ##
 @@ -93,24 +86,7 @@ public static void afterClass() throws Exception {
 
   @Test
   public void testSplitsVolume() throws Exception {
 
 Review comment:
   There is no more point having a separate testSplitVolume.  Only one 
testSplit method is enough. The only difference between the 2 methods is 
desiredBundleSizeBytes = 1. You can add this line to an else in `if 
(!useAsITests)` in testSplit. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139192)
Time Spent: 8h 10m  (was: 8h)

> Support ES 6.x for ElasticsearchIO
> --
>
> Key: BEAM-5107
> URL: https://issues.apache.org/jira/browse/BEAM-5107
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Dat Tran
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 6.3.2 but ElasticsearchIO only supports 2x-5.x.
> We should support ES 6.x for ElasticsearchIO.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
> https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5124) Write Euphoria in Beam documentation

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5124?focusedWorklogId=139205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139205
 ]

ASF GitHub Bot logged work on BEAM-5124:


Author: ASF GitHub Bot
Created on: 29/Aug/18 10:18
Start Date: 29/Aug/18 10:18
Worklog Time Spent: 10m 
  Work Description: VaclavPlajt edited a comment on issue #540: [BEAM-5124] 
DSL Euphoria documentation update
URL: https://github.com/apache/beam-site/pull/540#issuecomment-416902269
 
 
   How can I run the 'dead links' check again? It says that some links are dead 
but i found them to be ok.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139205)
Time Spent: 20m  (was: 10m)

> Write Euphoria in Beam documentation
> 
>
> Key: BEAM-5124
> URL: https://issues.apache.org/jira/browse/BEAM-5124
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5124) Write Euphoria in Beam documentation

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5124?focusedWorklogId=139204&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139204
 ]

ASF GitHub Bot logged work on BEAM-5124:


Author: ASF GitHub Bot
Created on: 29/Aug/18 10:18
Start Date: 29/Aug/18 10:18
Worklog Time Spent: 10m 
  Work Description: VaclavPlajt commented on issue #540: [BEAM-5124] DSL 
Euphoria documentation update
URL: https://github.com/apache/beam-site/pull/540#issuecomment-416902269
 
 
   How can I run the 'dead links' checkk again? It says that some links are 
dead but i found them to be ok.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139204)
Time Spent: 10m
Remaining Estimate: 0h

> Write Euphoria in Beam documentation
> 
>
> Key: BEAM-5124
> URL: https://issues.apache.org/jira/browse/BEAM-5124
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-5195) Add support for `TopPerKey` operator

2018-08-29 Thread Vaclav Plajt (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaclav Plajt resolved BEAM-5195.

   Resolution: Fixed
Fix Version/s: Not applicable

> Add support for `TopPerKey` operator
> 
>
> Key: BEAM-5195
> URL: https://issues.apache.org/jira/browse/BEAM-5195
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
> Fix For: Not applicable
>
>
> `TopPerKey` operator is not supported due to its decomposition to 
> `ReduceStateByKey` operator which is not supported. That decomposition is 
> wrong since it outputs one value per key so no state is needed to perform the 
> reduction.
> Change decomposition of `TopPerKey` to `ReduceByKey`. That will make 
> `TopPerKey` translatable to Beam API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5195) Add support for `TopPerKey` operator

2018-08-29 Thread Vaclav Plajt (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaclav Plajt closed BEAM-5195.
--

> Add support for `TopPerKey` operator
> 
>
> Key: BEAM-5195
> URL: https://issues.apache.org/jira/browse/BEAM-5195
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
> Fix For: Not applicable
>
>
> `TopPerKey` operator is not supported due to its decomposition to 
> `ReduceStateByKey` operator which is not supported. That decomposition is 
> wrong since it outputs one value per key so no state is needed to perform the 
> reduction.
> Change decomposition of `TopPerKey` to `ReduceByKey`. That will make 
> `TopPerKey` translatable to Beam API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5161) Enable FindBugs

2018-08-29 Thread Vaclav Plajt (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaclav Plajt closed BEAM-5161.
--

> Enable FindBugs
> ---
>
> Key: BEAM-5161
> URL: https://issues.apache.org/jira/browse/BEAM-5161
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
> Fix For: Not applicable
>
>
> {color:#33}Euphoia's `build.gradle` contains 
> `applyJavaNature(enableFindbugs: false)`. Enable  F{color}indBugs and solve 
> all the warnings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596213#comment-16596213
 ] 

Tim Robertson commented on BEAM-5036:
-

I think the cross FS check is actually already in place here [~reuvenlax]

[https://github.com/apache/beam/blob/release-2.6.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L307]

In logs you get the following in rename() if the schemes are different:

{{Caused by: java.lang.IllegalArgumentException: Expect srcResourceIds and 
destResourceIds have the same scheme, but received file, hdfs.}}

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=139218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139218
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 29/Aug/18 11:44
Start Date: 29/Aug/18 11:44
Worklog Time Spent: 10m 
  Work Description: timrobertson100 closed pull request #6289: [BEAM-5036] 
Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
index 92b2382e365..9178e13e038 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
@@ -758,8 +758,8 @@ final void moveToOutputFiles(
   }
   // During a failure case, files may have been deleted in an earlier 
step. Thus
   // we ignore missing files here.
-  FileSystems.copy(srcFiles, dstFiles, 
StandardMoveOptions.IGNORE_MISSING_FILES);
-  removeTemporaryFiles(srcFiles);
+  FileSystems.rename(srcFiles, dstFiles, 
StandardMoveOptions.IGNORE_MISSING_FILES);
+  removeTemporaryFiles(srcFiles); // defensive coding
 }
 
 /**


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139218)
Time Spent: 1.5h  (was: 1h 20m)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=139217&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139217
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 29/Aug/18 11:44
Start Date: 29/Aug/18 11:44
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6289: [BEAM-5036] 
Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-416923330
 
 
   *DO NOT MERGE THIS*
   
   Rename() will not overwrite existing files and instead will surface e.g.:
   ```
   Unable to rename resource 
hdfs://ha-nn/tmp/delme/.temp-beam-2018-08-29_11-41-47-0/1d676ec2-787d-4357-838f-f904e8d57b3d
 to hdfs://ha-nn/tmp/es-2012.txt-0-of-00045. No further information 
provided by underlying filesystem.
   ```
   
   We need to consider the consequences for this (and add tests).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139217)
Time Spent: 1h 20m  (was: 1h 10m)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596233#comment-16596233
 ] 

Tim Robertson commented on BEAM-5036:
-

The changes (yet to be merged) to rename() in BEAM-4861 now creates directories 
if missing, but also surfaces an exception if the underlying operation reports 
the operation did not complete.

This means it will fail with exception if the target file already exists:
{code}
Caused by: java.io.IOException: Unable to rename resource 
hdfs://ha-nn/tmp/delme/.temp-beam-2018-08-29_11-41-47-0/1d676ec2-787d-4357-838f-f904e8d57b3d
 to hdfs://ha-nn/tmp/es-2012.txt-0-of-00045. No further information 
provided by underlying filesystem.
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.rename(HadoopFileSystem.java:181)
at org.apache.beam.sdk.io.FileSystems.rename(FileSystems.java:326)
at 
org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:761)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:801)
{code} 

The original implementation using copy() would overwrite files without warning. 

Do we wish to silently overwrite files when issuing a rename()?

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596233#comment-16596233
 ] 

Tim Robertson edited comment on BEAM-5036 at 8/29/18 12:03 PM:
---

The changes (yet to be merged) to rename() in BEAM-4861 now creates directories 
if missing, but also surfaces an exception if the underlying operation reports 
the operation did not complete.

This means it will fail with exception if the target file already exists:
{code}
Caused by: java.io.IOException: Unable to rename resource 
hdfs://ha-nn/tmp/delme/.temp-beam-2018-08-29_11-41-47-0/1d676ec2-787d-4357-838f-f904e8d57b3d
 to hdfs://ha-nn/tmp/es-2012.txt-0-of-00045. No further information 
provided by underlying filesystem.
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.rename(HadoopFileSystem.java:181)
at org.apache.beam.sdk.io.FileSystems.rename(FileSystems.java:326)
at 
org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:761)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:801)
{code} 

The original implementation using copy() would overwrite files without warning. 

Do we wish to silently overwrite files when issuing a rename()? I am used to 
Hadoop operations failing if the output already exists so for me it sounds 
wrong - I'd rather be forced to delete manually than accidentally be able to 
overwrite TBs of data.


was (Author: timrobertson100):
The changes (yet to be merged) to rename() in BEAM-4861 now creates directories 
if missing, but also surfaces an exception if the underlying operation reports 
the operation did not complete.

This means it will fail with exception if the target file already exists:
{code}
Caused by: java.io.IOException: Unable to rename resource 
hdfs://ha-nn/tmp/delme/.temp-beam-2018-08-29_11-41-47-0/1d676ec2-787d-4357-838f-f904e8d57b3d
 to hdfs://ha-nn/tmp/es-2012.txt-0-of-00045. No further information 
provided by underlying filesystem.
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.rename(HadoopFileSystem.java:181)
at org.apache.beam.sdk.io.FileSystems.rename(FileSystems.java:326)
at 
org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:761)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:801)
{code} 

The original implementation using copy() would overwrite files without warning. 

Do we wish to silently overwrite files when issuing a rename()?

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5172) org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5172?focusedWorklogId=139224&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139224
 ]

ASF GitHub Bot logged work on BEAM-5172:


Author: ASF GitHub Bot
Created on: 29/Aug/18 12:18
Start Date: 29/Aug/18 12:18
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #6279:  [BEAM-5172] Fix 
Elasticsearch UTests flakiness
URL: https://github.com/apache/beam/pull/6279#issuecomment-416931730
 
 
   thx @mxm 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139224)
Time Spent: 1.5h  (was: 1h 20m)

> org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky
> -
>
> Key: BEAM-5172
> URL: https://issues.apache.org/jira/browse/BEAM-5172
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In a recent PostCommit builld, 
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1290/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testRead/
>  failed with:
> Error Message
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
> Stacktrace
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:168)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:413)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:404)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.testRead(ElasticsearchIOTestCommon.java:124)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testRead(ElasticsearchIOTest.java:125)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.jun

[jira] [Resolved] (BEAM-5172) org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky

2018-08-29 Thread Etienne Chauchot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-5172.

   Resolution: Fixed
Fix Version/s: 2.7.0

> org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky
> -
>
> Key: BEAM-5172
> URL: https://issues.apache.org/jira/browse/BEAM-5172
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In a recent PostCommit builld, 
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1290/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testRead/
>  failed with:
> Error Message
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
> Stacktrace
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:168)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:413)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:404)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.testRead(ElasticsearchIOTestCommon.java:124)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testRead(ElasticsearchIOTest.java:125)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  

[jira] [Comment Edited] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596233#comment-16596233
 ] 

Tim Robertson edited comment on BEAM-5036 at 8/29/18 12:20 PM:
---

The changes (yet to be merged) to rename() in BEAM-4861 now creates directories 
if missing, but also surfaces an exception if the underlying operation reports 
the operation did not complete.

This means it will fail with exception if the target file already exists:
{code}
Caused by: java.io.IOException: Unable to rename resource 
hdfs://ha-nn/tmp/delme/.temp-beam-2018-08-29_11-41-47-0/1d676ec2-787d-4357-838f-f904e8d57b3d
 to hdfs://ha-nn/tmp/es-2012.txt-0-of-00045. No further information 
provided by underlying filesystem.
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.rename(HadoopFileSystem.java:181)
at org.apache.beam.sdk.io.FileSystems.rename(FileSystems.java:326)
at 
org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:761)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:801)
{code} 

The original implementation using copy() would overwrite files without warning. 

Do we wish to silently overwrite files when issuing a rename()? I am used to 
Hadoop operations failing if the output already exists so for me it is correct 
to fail if the output exists - I'd rather be forced to delete manually than 
accidentally be able to overwrite TBs of data.


was (Author: timrobertson100):
The changes (yet to be merged) to rename() in BEAM-4861 now creates directories 
if missing, but also surfaces an exception if the underlying operation reports 
the operation did not complete.

This means it will fail with exception if the target file already exists:
{code}
Caused by: java.io.IOException: Unable to rename resource 
hdfs://ha-nn/tmp/delme/.temp-beam-2018-08-29_11-41-47-0/1d676ec2-787d-4357-838f-f904e8d57b3d
 to hdfs://ha-nn/tmp/es-2012.txt-0-of-00045. No further information 
provided by underlying filesystem.
at 
org.apache.beam.sdk.io.hdfs.HadoopFileSystem.rename(HadoopFileSystem.java:181)
at org.apache.beam.sdk.io.FileSystems.rename(FileSystems.java:326)
at 
org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:761)
at 
org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:801)
{code} 

The original implementation using copy() would overwrite files without warning. 

Do we wish to silently overwrite files when issuing a rename()? I am used to 
Hadoop operations failing if the output already exists so for me it sounds 
wrong - I'd rather be forced to delete manually than accidentally be able to 
overwrite TBs of data.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139233
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 12:48
Start Date: 29/Aug/18 12:48
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-416940051
 
 
   @tweise I've improve the process startup. It will now fail immediately if 
the process exists prematurely. Considering the container contract variables, I 
don't think it makes sense to set them as environment variables. This works for 
the Python SDK but not for Go because Go does not use environment variables. We 
could change that but I'm not sure if that is a good idea because that would 
make a double contract. Instead, I think your startup executable should decide 
what to do with the provided parameters, e.g. set environment variables before 
bringing up the SDK harness.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139233)
Time Spent: 3h  (was: 2h 50m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139242&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139242
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 13:19
Start Date: 29/Aug/18 13:19
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213672219
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -234,7 +125,7 @@ protected ServerFactory getServerFactory() {
 // We only use the published Docker ports 8100-8200 in a 
round-robin fashion
 () -> MAC_PORT.getAndUpdate(val -> val == MAC_PORT_END ? 
MAC_PORT_START : val + 1));
   default:
-LOG.warn("Unknown Docker platform. Falling back to default server 
factory");
+logger.warn("Unknown Docker platform. Falling back to default server 
factory");
 
 Review comment:
   I think there should be multiple instances since we want to see where the 
logging actually occurred (superclass vs. subclass).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139242)
Time Spent: 3h 10m  (was: 3h)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139244&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139244
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 13:24
Start Date: 29/Aug/18 13:24
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-416951911
 
 
   @mxm yes, I saw in the debugger that process exits right away with exit code 
1. It may actually be good to also capture the output. As for the environment 
variables, my suggestion wasn't to use environment variables in your 
implementation, just support the ability to specify them in `runCommand` since 
it is one of the basic ingredients of launching process and makes it reusable. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139244)
Time Spent: 3h 20m  (was: 3h 10m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139256&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139256
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 13:39
Start Date: 29/Aug/18 13:39
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213585817
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/JobBundleFactoryBase.java
 ##
 @@ -0,0 +1,331 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.fn.stream.OutboundObserverFactory;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public abstract class JobBundleFactoryBase implements JobBundleFactory {
+  protected final Logger logger = LoggerFactory.getLogger(getClass());
 
 Review comment:
   edit: Re-introduced the static instances.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139256)

[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Etienne Chauchot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596336#comment-16596336
 ] 

Etienne Chauchot commented on BEAM-5036:


I have mixed emotions regarding this: 
- silent "fail" (or overwrite in that case) is usually bad
- but local filesystems like ext4 for example silently overwrite the file in 
that case.
- if distributed filesystems like HDFS tend to fail in that case, maybe people 
are used to that behavior in this big data echosystem
=> What is the general consensus among big data fs technologies ? What is the 
behavior of GS in that case ? We could test them and then implement what the 
majority says 

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Etienne Chauchot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596336#comment-16596336
 ] 

Etienne Chauchot edited comment on BEAM-5036 at 8/29/18 1:40 PM:
-

I have mixed emotions regarding this: 
- silent "fail" (or overwrite in that case) is usually bad
- but local filesystems like ext4 for example silently overwrite the file in 
that case.
- if distributed filesystems like HDFS tend to fail in that case, maybe people 
are used to that behavior in this big data echosystem
=> What is the general consensus among big data fs technologies ? What is the 
behavior of GS and S3 in that case ? We could test them and then implement what 
the majority says 


was (Author: echauchot):
I have mixed emotions regarding this: 
- silent "fail" (or overwrite in that case) is usually bad
- but local filesystems like ext4 for example silently overwrite the file in 
that case.
- if distributed filesystems like HDFS tend to fail in that case, maybe people 
are used to that behavior in this big data echosystem
=> What is the general consensus among big data fs technologies ? What is the 
behavior of GS in that case ? We could test them and then implement what the 
majority says 

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139257&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139257
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 13:40
Start Date: 29/Aug/18 13:40
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213680869
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -234,7 +125,7 @@ protected ServerFactory getServerFactory() {
 // We only use the published Docker ports 8100-8200 in a 
round-robin fashion
 () -> MAC_PORT.getAndUpdate(val -> val == MAC_PORT_END ? 
MAC_PORT_START : val + 1));
   default:
-LOG.warn("Unknown Docker platform. Falling back to default server 
factory");
+logger.warn("Unknown Docker platform. Falling back to default server 
factory");
 
 Review comment:
   Makes sense. Re-introduced the static instances.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139257)
Time Spent: 3h 40m  (was: 3.5h)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3926) Support MetricsPusher in Dataflow Runner

2018-08-29 Thread Etienne Chauchot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596346#comment-16596346
 ] 

Etienne Chauchot commented on BEAM-3926:


Hi [~foegler] do you need any help on that ?

> Support MetricsPusher in Dataflow Runner
> 
>
> Key: BEAM-3926
> URL: https://issues.apache.org/jira/browse/BEAM-3926
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Scott Wegner
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> See [relevant email 
> thread|https://lists.apache.org/thread.html/2e87f0adcdf8d42317765f298e3e6fdba72917a72d4a12e71e67e4b5@%3Cdev.beam.apache.org%3E].
>  From [~echauchot]:
>   
> _AFAIK Dataflow being a cloud hosted engine, the related runner is very 
> different from the others. It just submits a job to the cloud hosted engine. 
> So, no access to metrics container etc... from the runner. So I think that 
> the MetricsPusher (component responsible for merging metrics and pushing them 
> to a sink backend) must not be instanciated in DataflowRunner otherwise it 
> would be more a client (driver) piece of code and we will lose all the 
> interest of being close to the execution engine (among other things 
> instrumentation of the execution of the pipelines).  I think that the 
> MetricsPusher needs to be instanciated in the actual Dataflow engine._
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596353#comment-16596353
 ] 

Tim Robertson commented on BEAM-5036:
-

Thanks [~echauchot]
Gcs and S3 have no notion of a rename they are copy (overwrite) and delete (see 
links in comment above).

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596353#comment-16596353
 ] 

Tim Robertson edited comment on BEAM-5036 at 8/29/18 1:51 PM:
--

Thanks [~echauchot]
Gcs and S3 have no notion of a rename and are implemented as copy (overwrite) 
and delete (see links in comment above).


was (Author: timrobertson100):
Thanks [~echauchot]
Gcs and S3 have no notion of a rename they are copy (overwrite) and delete (see 
links in comment above).

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139270&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139270
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 14:03
Start Date: 29/Aug/18 14:03
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-416965280
 
 
   I was able to launch the python process directly when running the JVM within 
virtualenv. Environment variable are forwarded by default. Just need the 
ability to specify additional variables that are needed by the worker (in my 
customized launcher).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139270)
Time Spent: 3h 50m  (was: 3h 40m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5256) Update Bigtable dependency

2018-08-29 Thread Kevin Si (JIRA)
Kevin Si created BEAM-5256:
--

 Summary: Update Bigtable dependency
 Key: BEAM-5256
 URL: https://issues.apache.org/jira/browse/BEAM-5256
 Project: Beam
  Issue Type: Improvement
  Components: dependencies
Affects Versions: 2.6.0
Reporter: Kevin Si
 Fix For: 2.7.0


According to the following dependency tree, 
"beam-sdks-java-io-google-cloud-platform" is depending on "bigtable-protos". 

[INFO] Verbose not supported since maven-dependency-plugin 3.0 [INFO] 
com.google.cloud.teleport:google-cloud-teleport-java:jar:0.1-SNAPSHOT [INFO] \- 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.6.0:compile 
[INFO] \- com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3:compile

 

Instead, it should depends on:

 

com.google.api.grpc
proto-google-cloud-bigtable-v2
0.24.0

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Neville Li (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596392#comment-16596392
 ] 

Neville Li commented on BEAM-5036:
--

If I understand this correctly, this issue affects all file based IOs, 
including Avro? We have a lot of jobs with huge Avro outputs.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139276&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139276
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 14:20
Start Date: 29/Aug/18 14:20
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-416971212
 
 
   >It may actually be good to also capture the output.
   
   For debugging purposes we could save some of the output but I don't want to 
read all the output because it could create memory pressure. For the startup 
the output might be helpful but everything else should go through the logging 
server.
   
   >As  for the environment variables, my suggestion wasn't to use environment 
variables in your implementation, just support the ability to specify  them in 
runCommand since it is one of the basic ingredients of launching process and 
makes it reusable.
   
   I see, that can be done.
   
   >I was able to launch the python process directly when running the JVM 
within virtualenv.
   
   Great :)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139276)
Time Spent: 4h  (was: 3h 50m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596407#comment-16596407
 ] 

Tim Robertson commented on BEAM-5036:
-

Yes [~sinisa_lyh] it does.

I have observed big performance gains where rewriting 1.5TB Avro files 
({{AvroIO.write()}}) using Beam on Spark with HDFS can reduce from 1.7hrs to 42 
minutes on one of my clusters.
 It will only impact HDFS and LocalFileSystem though, as S3 and Gcs do not have 
the ability to do an optimised rename.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Neville Li (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596410#comment-16596410
 ] 

Neville Li commented on BEAM-5036:
--

Yeah that's my main concern. We use GCS almost exclusively so all our jobs are 
affected by this. 

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5124) Write Euphoria in Beam documentation

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5124?focusedWorklogId=139285&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139285
 ]

ASF GitHub Bot logged work on BEAM-5124:


Author: ASF GitHub Bot
Created on: 29/Aug/18 14:36
Start Date: 29/Aug/18 14:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #540: [BEAM-5124] DSL 
Euphoria documentation update
URL: https://github.com/apache/beam-site/pull/540#issuecomment-416977085
 
 
   You can comment "retest this please" in the pull request, and that should
   retrigger the tests
   
   On Wed, Aug 29, 2018, 3:18 AM Vaclav Plajt  wrote:
   
   > How can I run the 'dead links' checkk again? It says that some links are
   > dead but i found them to be ok.
   >
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub
   > , or 
mute
   > the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139285)
Time Spent: 0.5h  (was: 20m)

> Write Euphoria in Beam documentation
> 
>
> Key: BEAM-5124
> URL: https://issues.apache.org/jira/browse/BEAM-5124
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Reporter: Vaclav Plajt
>Assignee: Vaclav Plajt
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5256) Update Bigtable dependency

2018-08-29 Thread Kevin Si (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596427#comment-16596427
 ] 

Kevin Si commented on BEAM-5256:


Second thought, we may want to use a different version of Bigtable client, so 
put this change on hold.

> Update Bigtable dependency
> --
>
> Key: BEAM-5256
> URL: https://issues.apache.org/jira/browse/BEAM-5256
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.6.0
>Reporter: Kevin Si
>Priority: Minor
> Fix For: 2.7.0
>
>
> According to the following dependency tree, 
> "beam-sdks-java-io-google-cloud-platform" is depending on "bigtable-protos". 
> [INFO] Verbose not supported since maven-dependency-plugin 3.0 [INFO] 
> com.google.cloud.teleport:google-cloud-teleport-java:jar:0.1-SNAPSHOT [INFO] 
> \- org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.6.0:compile 
> [INFO] \- com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3:compile
>  
> Instead, it should depends on:
>  
> com.google.api.grpc
> proto-google-cloud-bigtable-v2
> 0.24.0
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5256) Update Bigtable dependency

2018-08-29 Thread Kevin Si (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Si closed BEAM-5256.
--
Resolution: Won't Fix

> Update Bigtable dependency
> --
>
> Key: BEAM-5256
> URL: https://issues.apache.org/jira/browse/BEAM-5256
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.6.0
>Reporter: Kevin Si
>Priority: Minor
> Fix For: 2.7.0
>
>
> According to the following dependency tree, 
> "beam-sdks-java-io-google-cloud-platform" is depending on "bigtable-protos". 
> [INFO] Verbose not supported since maven-dependency-plugin 3.0 [INFO] 
> com.google.cloud.teleport:google-cloud-teleport-java:jar:0.1-SNAPSHOT [INFO] 
> \- org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.6.0:compile 
> [INFO] \- com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3:compile
>  
> Instead, it should depends on:
>  
> com.google.api.grpc
> proto-google-cloud-bigtable-v2
> 0.24.0
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139288
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 14:57
Start Date: 29/Aug/18 14:57
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-416984730
 
 
   Probably all of the process stdout/stderr output should go to runner 
logging, irrespective of whether at startup or later? If we redirect it 
elsewhere, then important information may be missing, such as when there is an 
issue with process startup or unexpected termination.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139288)
Time Spent: 4h 10m  (was: 4h)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Jozef Vilcek (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596491#comment-16596491
 ] 

Jozef Vilcek commented on BEAM-5036:


As far as I understand this, there can be only gain, no performance 
degradation. Gain is for FS with effective rename like HDFS. As for GFS, things 
should be the same. `moveToOutput()` now do a `copy` of data + `delete` source. 
New way it would be `rename` and GFS implements this (as [~timrobertson100] 
states above) as `copy` + `delete` too.

So jobs should not be affected? Am I missing something?

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Neville Li (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596513#comment-16596513
 ] 

Neville Li commented on BEAM-5036:
--

{{copy+delete}} is still expensive on GCS, especially when running 10Ks of jobs 
writing TBs of data daily. My memory is a bit vague, but was there a time when 
{{AvroIO}} wrote to output files directly without a {{rename}} or 
{{copy+delete}}?

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5255) Fix over-aggressive division futurization.

2018-08-29 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-5255:
-

Assignee: Valentyn Tymofieiev  (was: Ahmet Altay)

> Fix over-aggressive division futurization.
> --
>
> Key: BEAM-5255
> URL: https://issues.apache.org/jira/browse/BEAM-5255
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.6.0
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When converting from Python 2 to Python 3, `a / b` becomes `a // b` only for 
> ints, but it is incorrect to do this substitution for floating point 
> division. 
>  
> I noticed this change in the microbenchmarks, but we should do an audit to 
> make sure we haven't broken things elsewhere. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5255) Fix over-aggressive division futurization.

2018-08-29 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596524#comment-16596524
 ] 

Valentyn Tymofieiev commented on BEAM-5255:
---

We caught a several incorrect division conversion (done by a tool) in code 
reviews, but looks like we missed this one.

> Fix over-aggressive division futurization.
> --
>
> Key: BEAM-5255
> URL: https://issues.apache.org/jira/browse/BEAM-5255
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.6.0
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When converting from Python 2 to Python 3, `a / b` becomes `a // b` only for 
> ints, but it is incorrect to do this substitution for floating point 
> division. 
>  
> I noticed this change in the microbenchmarks, but we should do an audit to 
> make sure we haven't broken things elsewhere. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5255) Fix over-aggressive division futurization.

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5255?focusedWorklogId=139300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139300
 ]

ASF GitHub Bot logged work on BEAM-5255:


Author: ASF GitHub Bot
Created on: 29/Aug/18 15:57
Start Date: 29/Aug/18 15:57
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #6293: [BEAM-5255] Fix 
over-aggressive division futurization in benchmarks.
URL: https://github.com/apache/beam/pull/6293#issuecomment-417005802
 
 
   LGTM thank you. See also: https://issues.apache.org/jira/browse/BEAM-4858.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139300)
Time Spent: 0.5h  (was: 20m)

> Fix over-aggressive division futurization.
> --
>
> Key: BEAM-5255
> URL: https://issues.apache.org/jira/browse/BEAM-5255
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.6.0
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When converting from Python 2 to Python 3, `a / b` becomes `a // b` only for 
> ints, but it is incorrect to do this substitution for floating point 
> division. 
>  
> I noticed this change in the microbenchmarks, but we should do an audit to 
> make sure we haven't broken things elsewhere. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Jozef Vilcek (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596542#comment-16596542
 ] 

Jozef Vilcek commented on BEAM-5036:


Aha, I see. I did not know it might use to work differently before. Maybe the 
"write to temp file -> promote to final destination" is new with WriteFiles?

Sure, every unnecessary IO is suboptimal. Agreed.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] 01/01: Merge pull request #6293 [BEAM-5255] Fix over-aggressive division futurization in benchmarks.

2018-08-29 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git

commit e278e57077d8ab6459a9833eccd2ef20dc2faeae
Merge: 053b1d8 4e5c642
Author: Robert Bradshaw 
AuthorDate: Wed Aug 29 18:05:09 2018 +0200

Merge pull request #6293 [BEAM-5255] Fix over-aggressive division 
futurization in benchmarks.

[BEAM-5255] Fix over-aggressive division futurization in benchmarks.

 .../python/apache_beam/tools/distribution_counter_microbenchmark.py | 4 ++--
 sdks/python/apache_beam/tools/map_fn_microbenchmark.py  | 2 +-
 sdks/python/apache_beam/tools/sideinput_microbenchmark.py   | 6 +++---
 3 files changed, 6 insertions(+), 6 deletions(-)



[beam] branch master updated (053b1d8 -> e278e57)

2018-08-29 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 053b1d8  Merge pull request #6279 from 
echauchot/BEAM-5172-flaky-es2-tests
 add 4e5c642  [BEAM-5255] Fix over-aggressive division futurization in 
benchmarks.
 new e278e57  Merge pull request #6293 [BEAM-5255] Fix over-aggressive 
division futurization in benchmarks.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../python/apache_beam/tools/distribution_counter_microbenchmark.py | 4 ++--
 sdks/python/apache_beam/tools/map_fn_microbenchmark.py  | 2 +-
 sdks/python/apache_beam/tools/sideinput_microbenchmark.py   | 6 +++---
 3 files changed, 6 insertions(+), 6 deletions(-)



[jira] [Work logged] (BEAM-5255) Fix over-aggressive division futurization.

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5255?focusedWorklogId=139301&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139301
 ]

ASF GitHub Bot logged work on BEAM-5255:


Author: ASF GitHub Bot
Created on: 29/Aug/18 16:05
Start Date: 29/Aug/18 16:05
Worklog Time Spent: 10m 
  Work Description: robertwb closed pull request #6293: [BEAM-5255] Fix 
over-aggressive division futurization in benchmarks.
URL: https://github.com/apache/beam/pull/6293
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/python/apache_beam/tools/distribution_counter_microbenchmark.py 
b/sdks/python/apache_beam/tools/distribution_counter_microbenchmark.py
index 1e0809f0c11..06035d55872 100644
--- a/sdks/python/apache_beam/tools/distribution_counter_microbenchmark.py
+++ b/sdks/python/apache_beam/tools/distribution_counter_microbenchmark.py
@@ -60,8 +60,8 @@ def run_benchmark(num_runs=100, num_input=1, 
seed=time.time()):
 counter.add_inputs_for_test(inputs)
 time_cost = time.time() - start
 print("Run %d: Total time cost %g sec" % (i+1, time_cost))
-total_time += time_cost // num_input
-  print("Per element update time cost:", total_time // num_runs)
+total_time += time_cost / num_input
+  print("Per element update time cost:", total_time / num_runs)
 
 
 if __name__ == '__main__':
diff --git a/sdks/python/apache_beam/tools/map_fn_microbenchmark.py 
b/sdks/python/apache_beam/tools/map_fn_microbenchmark.py
index 116c28e853e..6b4a143e2e1 100644
--- a/sdks/python/apache_beam/tools/map_fn_microbenchmark.py
+++ b/sdks/python/apache_beam/tools/map_fn_microbenchmark.py
@@ -61,7 +61,7 @@ def run_benchmark(num_maps=100, num_runs=10, 
num_elements_step=1000):
   gradient, intercept, r_value, p_value, std_err = stats.linregress(
   *list(zip(*list(timings.items()
   print("Fixed cost  ", intercept)
-  print("Per-element ", gradient // num_maps)
+  print("Per-element ", gradient / num_maps)
   print("R^2 ", r_value**2)
 
 
diff --git a/sdks/python/apache_beam/tools/sideinput_microbenchmark.py 
b/sdks/python/apache_beam/tools/sideinput_microbenchmark.py
index f517304302c..8754d8604eb 100644
--- a/sdks/python/apache_beam/tools/sideinput_microbenchmark.py
+++ b/sdks/python/apache_beam/tools/sideinput_microbenchmark.py
@@ -69,10 +69,10 @@ def run_benchmark(num_runs=50, input_per_source=4000, 
num_sources=4):
 
   print("Runtimes:", times)
 
-  avg_runtime = sum(times) // len(times)
+  avg_runtime = sum(times) / len(times)
   print("Average runtime:", avg_runtime)
-  print("Time per element:", avg_runtime // (input_per_source *
- num_sources))
+  print("Time per element:", avg_runtime / (input_per_source *
+num_sources))
 
 
 if __name__ == '__main__':


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139301)
Time Spent: 40m  (was: 0.5h)

> Fix over-aggressive division futurization.
> --
>
> Key: BEAM-5255
> URL: https://issues.apache.org/jira/browse/BEAM-5255
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.6.0
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When converting from Python 2 to Python 3, `a / b` becomes `a // b` only for 
> ints, but it is incorrect to do this substitution for floating point 
> division. 
>  
> I noticed this change in the microbenchmarks, but we should do an audit to 
> make sure we haven't broken things elsewhere. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Reuven Lax (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596545#comment-16596545
 ] 

Reuven Lax commented on BEAM-5036:
--

As to ignoring errors: the one thing we need to make sure is that the operation 
is idempotent. The bundle might fail at any point and get retried, and the 
retry should succeed if possible.

For filesystems that use copy/delete, this means that we should ignore 
file-already-exists errors. Otherwise retrying the bundle will cause a 
permanent failure as the transform gets retried, and eventually fail the job 
(depending on runner).

For filesystems such as HDFS (or local) for which atomic rename exists, this 
means we have to ignore failures where the _source_ file doesn't exist (we also 
have to do this with GCS/S3). I believe the code already attempts to do this 
with IGNORE_MISSING_FILES, though there are slight race conditions in that 
check today.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Reuven Lax (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596548#comment-16596548
 ] 

Reuven Lax commented on BEAM-5036:
--

[~sinisa_lyh] I don't think there is any semantically-correct way to write 
directly to final files (unless you're ok with incomplete or corrupted output), 
and I don't think Beam ever did that. If workers crash, etc. you'll end up with 
partially-written files. What's more, there's no guarantee that a retry will 
write the exact same data in the files.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Neville Li (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596565#comment-16596565
 ] 

Neville Li edited comment on BEAM-5036 at 8/29/18 4:23 PM:
---

Yeah that's what I figured. So there's no way to reduce this overhead on GCS 
unless if GCS starts to support efficient object {{rename}}.


was (Author: sinisa_lyh):
Yeah that's why I figured. So there's no way to reduce this overhead on GCS 
unless if GCS starts to support efficient object {{rename}}.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Neville Li (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596565#comment-16596565
 ] 

Neville Li commented on BEAM-5036:
--

Yeah that's why I figured. So there's no way to reduce this overhead on GCS 
unless if GCS starts to support efficient object {{rename}}.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5191) Add support for writing to BigQuery clustered tables

2018-08-29 Thread Luke Cwik (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-5191:
---

Assignee: Wout Scheepers  (was: Chamikara Jayalath)

> Add support for writing to BigQuery clustered tables
> 
>
> Key: BEAM-5191
> URL: https://issues.apache.org/jira/browse/BEAM-5191
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.6.0
>Reporter: Robert Sahlin
>Assignee: Wout Scheepers
>Priority: Minor
>  Labels: features, newbie
>
> Google recently added support for clustered tables in BigQuery. It would be 
> useful to set clustering columns the same way as for partitioning. It should 
> support multiple fields (4) for clustering.
> For example:
> [BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]>
>  .withClustering(new Clustering().setField("productId").setType("STRING"))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139308&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139308
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 29/Aug/18 16:38
Start Date: 29/Aug/18 16:38
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #6282: [BEAM-4495] Website 
pre-commit job
URL: https://github.com/apache/beam/pull/6282#issuecomment-417020101
 
 
   R: @pabloem 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139308)
Time Spent: 5h  (was: 4h 50m)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5257) Java TextIO should accept an optional Encoding parameter

2018-08-29 Thread Peter Farkas (JIRA)
Peter Farkas created BEAM-5257:
--

 Summary: Java TextIO should accept an optional Encoding parameter
 Key: BEAM-5257
 URL: https://issues.apache.org/jira/browse/BEAM-5257
 Project: Beam
  Issue Type: Improvement
  Components: io-java-text
Reporter: Peter Farkas
Assignee: Eugene Kirpichov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5258) Investigate if we can disable Row type flattening

2018-08-29 Thread Rui Wang (JIRA)
Rui Wang created BEAM-5258:
--

 Summary: Investigate if we can disable Row type flattening
 Key: BEAM-5258
 URL: https://issues.apache.org/jira/browse/BEAM-5258
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang


Either disable the flattening in PlannerImpl or Flattener could be a good start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5259) Test nested select query with nested Row

2018-08-29 Thread Rui Wang (JIRA)
Rui Wang created BEAM-5259:
--

 Summary: Test nested select query with nested Row
 Key: BEAM-5259
 URL: https://issues.apache.org/jira/browse/BEAM-5259
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang


Test something like

 

SELECT i.row.string from (

select row from table where id = 1

) as i;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5260) Investigate if nested row constructor works

2018-08-29 Thread Rui Wang (JIRA)
Rui Wang created BEAM-5260:
--

 Summary: Investigate if nested row constructor works 
 Key: BEAM-5260
 URL: https://issues.apache.org/jira/browse/BEAM-5260
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang


Check if 

 

SELECT ROW(row.string, ROW(row.row.int, row.row.long)) works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5260) Investigate if nested row constructor works

2018-08-29 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-5260:
---
Description: 
Check if 

 

```SELECT ROW(row.string, ROW(row.row.int, row.row.long))``` works.

  was:
Check if 

 

SELECT ROW(row.string, ROW(row.row.int, row.row.long)) works.


> Investigate if nested row constructor works 
> 
>
> Key: BEAM-5260
> URL: https://issues.apache.org/jira/browse/BEAM-5260
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> Check if 
>  
> ```SELECT ROW(row.string, ROW(row.row.int, row.row.long))``` works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5260) Investigate if nested row constructor works

2018-08-29 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-5260:
---
Description: 
Check if 

 

`SELECT ROW(row.string, ROW(row.row.int, row.row.long))` works.

  was:
Check if 

 

```SELECT ROW(row.string, ROW(row.row.int, row.row.long))``` works.


> Investigate if nested row constructor works 
> 
>
> Key: BEAM-5260
> URL: https://issues.apache.org/jira/browse/BEAM-5260
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> Check if 
>  
> `SELECT ROW(row.string, ROW(row.row.int, row.row.long))` works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5260) Investigate if nested row constructor works

2018-08-29 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-5260:
---
Description: 
Check if 

 
{code:java}
SELECT ROW(row.string, ROW(row.row.int, row.row.long)){code}
{{works.}}

  was:
Check if 

 

`SELECT ROW(row.string, ROW(row.row.int, row.row.long))` works.


> Investigate if nested row constructor works 
> 
>
> Key: BEAM-5260
> URL: https://issues.apache.org/jira/browse/BEAM-5260
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> Check if 
>  
> {code:java}
> SELECT ROW(row.string, ROW(row.row.int, row.row.long)){code}
> {{works.}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5259) Test nested select query with nested Row

2018-08-29 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-5259:
---
Description: 
add tests to something like

 

 
{code:java}
SELECT i.row.string from (
select row from table where id = 1
) as i;
{code}
, if it does not work, have a solution to fix it.

 

  was:
Test something like

 

SELECT i.row.string from (

select row from table where id = 1

) as i;


> Test nested select query with nested Row
> 
>
> Key: BEAM-5259
> URL: https://issues.apache.org/jira/browse/BEAM-5259
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> add tests to something like
>  
>  
> {code:java}
> SELECT i.row.string from (
> select row from table where id = 1
> ) as i;
> {code}
> , if it does not work, have a solution to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5258) Investigate if we can disable Row type flattening in Calcite

2018-08-29 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-5258:
---
Summary: Investigate if we can disable Row type flattening in Calcite  
(was: Investigate if we can disable Row type flattening)

> Investigate if we can disable Row type flattening in Calcite
> 
>
> Key: BEAM-5258
> URL: https://issues.apache.org/jira/browse/BEAM-5258
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> Either disable the flattening in PlannerImpl or Flattener could be a good 
> start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139358
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 18:09
Start Date: 29/Aug/18 18:09
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213780896
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactory.java
 ##
 @@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import com.google.common.collect.ImmutableList;
+import java.time.Duration;
+import java.util.List;
+import java.util.concurrent.TimeoutException;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientPool;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * An {@link EnvironmentFactory} which forks processes based on the given URL 
in the Environment.
+ * The returned {@link ProcessEnvironment} has to make sure to stop the 
processes.
+ */
+public class ProcessEnvironmentFactory implements EnvironmentFactory {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ProcessEnvironmentFactory.class);
+
+  public static ProcessEnvironmentFactory create(
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  IdGenerator idGenerator) {
+return create(
+ProcessManager.getDefault(),
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+clientSource,
+idGenerator);
+  }
+
+  static ProcessEnvironmentFactory create(
+  ProcessManager processManager,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  IdGenerator idGenerator) {
+return new ProcessEnvironmentFactory(
+processManager,
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+idGenerator,
+clientSource);
+  }
+
+  private final ProcessManager processManager;
+  private final GrpcFnServer 
controlServiceServer;
+  private final GrpcFnServer loggingServiceServer;
+  private final GrpcFnServer retrievalServiceServer;
+  private final GrpcFnServer 
provisioningServiceServer;
+  private final IdGenerator idGenerator;
+  private final ControlClientPool.Source clientSource;
+
+  private ProcessEnvironmentFactory(
+  ProcessManager processManager,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  IdGenerator idGenerator,
+  ControlClientPool.Source clientSource) {
+this.processManager = processManager;
+this.controlServiceServer = controlServiceServer;
+this.loggingServiceServer = loggingServiceServer;
+this.retrievalServiceServer = retrievalServiceServer;
+this.provisioningServiceServer = provisioningServiceSer

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139359
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 18:10
Start Date: 29/Aug/18 18:10
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213781303
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactory.java
 ##
 @@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import com.google.common.collect.ImmutableList;
+import java.time.Duration;
+import java.util.List;
+import java.util.concurrent.TimeoutException;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientPool;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * An {@link EnvironmentFactory} which forks processes based on the given URL 
in the Environment.
+ * The returned {@link ProcessEnvironment} has to make sure to stop the 
processes.
+ */
+public class ProcessEnvironmentFactory implements EnvironmentFactory {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ProcessEnvironmentFactory.class);
+
+  public static ProcessEnvironmentFactory create(
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  IdGenerator idGenerator) {
+return create(
+ProcessManager.getDefault(),
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+clientSource,
+idGenerator);
+  }
+
+  static ProcessEnvironmentFactory create(
+  ProcessManager processManager,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  IdGenerator idGenerator) {
+return new ProcessEnvironmentFactory(
+processManager,
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+idGenerator,
+clientSource);
+  }
+
+  private final ProcessManager processManager;
+  private final GrpcFnServer 
controlServiceServer;
+  private final GrpcFnServer loggingServiceServer;
+  private final GrpcFnServer retrievalServiceServer;
+  private final GrpcFnServer 
provisioningServiceServer;
+  private final IdGenerator idGenerator;
+  private final ControlClientPool.Source clientSource;
+
+  private ProcessEnvironmentFactory(
+  ProcessManager processManager,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  IdGenerator idGenerator,
+  ControlClientPool.Source clientSource) {
+this.processManager = processManager;
+this.controlServiceServer = controlServiceServer;
+this.loggingServiceServer = loggingServiceServer;
+this.retrievalServiceServer = retrievalServiceServer;
+this.provisioningServiceServer = provisioningServiceSer

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139361&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139361
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 18:15
Start Date: 29/Aug/18 18:15
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213782630
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessManager.java
 ##
 @@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution.environment;
+
+import static com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableList;
+import java.io.File;
+import java.io.IOException;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.concurrent.ThreadSafe;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A simple process manager which forks processes and kills them if 
necessary. */
+@ThreadSafe
+class ProcessManager {
+  private static final Logger LOG = 
LoggerFactory.getLogger(ProcessManager.class);
+
+  private static final ProcessManager INSTANCE = new ProcessManager();
+
+  private final Map processes;
+
+  public static ProcessManager getInstance() {
+return INSTANCE;
+  }
+
+  private ProcessManager() {
+this.processes = Collections.synchronizedMap(new HashMap<>());
+  }
+
+  static class RunningProcess {
+private Process process;
+
+RunningProcess(Process process) {
+  this.process = process;
+}
+
+/** Checks if the underlying process is still running. */
+void isAliveOrThrow() throws IllegalStateException {
+  if (!process.isAlive()) {
+throw new IllegalStateException("Process died with exit code " + 
process.exitValue());
+  }
+}
+
+@VisibleForTesting
+Process getUnderlyingProcess() {
+  return process;
+}
+  }
+
+  /**
+   * Forks a process with the given command and arguments.
+   *
+   * @param id A unique id for the process
+   * @param command the name of the executable to run
+   * @param args arguments to provide to the executable
+   * @return A RunningProcess which can be checked for liveness
+   */
+  RunningProcess startProcess(String id, String command, List args) 
throws IOException {
+return startProcess(id, command, args, Collections.emptyMap());
+  }
+
+  /**
+   * Forks a process with the given command, arguments, and additional 
environment variables.
+   *
+   * @param id A unique id for the process
+   * @param command The name of the executable to run
+   * @param args Arguments to provide to the executable
+   * @param env Additional environment variables for the process to be forked
+   * @return A RunningProcess which can be checked for liveness
+   */
+  RunningProcess startProcess(String id, String command, List args, 
Map env)
+  throws IOException {
+checkNotNull(id, "Process id must not be null");
+checkNotNull(command, "Command must not be null");
+checkNotNull(args, "Process args must not be null");
+checkNotNull(env, "Environment map must not be null");
+
+ProcessBuilder pb =
+new 
ProcessBuilder(ImmutableList.builder().add(command).addAll(args).build());
+pb.environment().putAll(env);
+
+LOG.debug("Attempting to start process with command: " + pb.command());
+// Pipe stdout and stderr to /dev/null to avoid blocking the process due 
to filled PIPE buffer
+pb.redirectErrorStream(true);
 
 Review comment:
   Consider inheriting stderr (and possibly also stdout), possibly optional, 
since it may be hard to diagnose issues when the output isn't available.


This is an automated message from the Apache Git Service.

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139367
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 18:30
Start Date: 29/Aug/18 18:30
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-417058004
 
 
   @mxm with a customization I got the python worker to come up without the 
boot.go 
(https://github.com/tweise/beam/commit/006ad96de84a839f8f07abca70c67f5549eca0a1)
   
   Please see the diff, there are a few changes that would be needed in your PR 
to support it (primarily modifiers).
   
   btw I noticed that after job server shutdown, launched processes still stick 
around and don't exit
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139367)
Time Spent: 4h 50m  (was: 4h 40m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated: Moving to 2.8.0-SNAPSHOT on master branch.

2018-08-29 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 693de50  Moving to 2.8.0-SNAPSHOT on master branch.
693de50 is described below

commit 693de50164926983c9702323e78d5edcb6eb9aa3
Author: Charles Chen 
AuthorDate: Wed Aug 29 12:14:22 2018 -0700

Moving to 2.8.0-SNAPSHOT on master branch.
---
 buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy | 2 +-
 gradle.properties   | 2 +-
 sdks/python/apache_beam/version.py  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy 
b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
index 27d6661..bbfbfb9 100644
--- a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
+++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
@@ -232,7 +232,7 @@ class BeamModulePlugin implements Plugin {
 
 // Automatically use the official release version if we are performing a 
release
 // otherwise append '-SNAPSHOT'
-project.version = '2.7.0'
+project.version = '2.8.0'
 if (!isRelease(project)) {
   project.version += '-SNAPSHOT'
 }
diff --git a/gradle.properties b/gradle.properties
index 1e66e7c..ba6555a 100644
--- a/gradle.properties
+++ b/gradle.properties
@@ -22,4 +22,4 @@ offlineRepositoryRoot=offline-repository
 signing.gnupg.executable=gpg
 signing.gnupg.useLegacyGpg=true
 
-version=2.7.0-SNAPSHOT
+version=2.8.0-SNAPSHOT
diff --git a/sdks/python/apache_beam/version.py 
b/sdks/python/apache_beam/version.py
index 2888107..48d221a 100644
--- a/sdks/python/apache_beam/version.py
+++ b/sdks/python/apache_beam/version.py
@@ -18,4 +18,4 @@
 """Apache Beam SDK version information and utilities."""
 
 
-__version__ = '2.7.0.dev'
+__version__ = '2.8.0.dev'



[beam] branch release-2.7.0 created (now bd9dc3f)

2018-08-29 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a change to branch release-2.7.0
in repository https://gitbox.apache.org/repos/asf/beam.git.


  at bd9dc3f  Create release branch for version 2.7.0.

This branch includes the following new commits:

 new bd9dc3f  Create release branch for version 2.7.0.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[beam] 01/01: Create release branch for version 2.7.0.

2018-08-29 Thread ccy
This is an automated email from the ASF dual-hosted git repository.

ccy pushed a commit to branch release-2.7.0
in repository https://gitbox.apache.org/repos/asf/beam.git

commit bd9dc3fe79702ead2948eea8c13955c1e50e464b
Author: Charles Chen 
AuthorDate: Wed Aug 29 12:15:49 2018 -0700

Create release branch for version 2.7.0.
---
 runners/google-cloud-dataflow-java/build.gradle | 2 +-
 sdks/python/apache_beam/version.py  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/runners/google-cloud-dataflow-java/build.gradle 
b/runners/google-cloud-dataflow-java/build.gradle
index 9682e94..acaba23 100644
--- a/runners/google-cloud-dataflow-java/build.gradle
+++ b/runners/google-cloud-dataflow-java/build.gradle
@@ -36,7 +36,7 @@ processResources {
   filter org.apache.tools.ant.filters.ReplaceTokens, tokens: [
 'dataflow.legacy_environment_major_version' : '7',
 'dataflow.fnapi_environment_major_version' : '7',
-'dataflow.container_version' : 'beam-master-20180730'
+'dataflow.container_version' : 'beam-2.7.0'
   ]
 }
 
diff --git a/sdks/python/apache_beam/version.py 
b/sdks/python/apache_beam/version.py
index 2888107..2bf27d2 100644
--- a/sdks/python/apache_beam/version.py
+++ b/sdks/python/apache_beam/version.py
@@ -18,4 +18,4 @@
 """Apache Beam SDK version information and utilities."""
 
 
-__version__ = '2.7.0.dev'
+__version__ = '2.7.0'



[jira] [Closed] (BEAM-5027) Schemas do not work on Dataflow runner of FnApi Runner

2018-08-29 Thread Reuven Lax (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuven Lax closed BEAM-5027.

   Resolution: Fixed
Fix Version/s: 2.7.0

> Schemas do not work on Dataflow runner of FnApi Runner
> --
>
> Key: BEAM-5027
> URL: https://issues.apache.org/jira/browse/BEAM-5027
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: 2.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=139389&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139389
 ]

ASF GitHub Bot logged work on BEAM-4461:


Author: ASF GitHub Bot
Created on: 29/Aug/18 19:42
Start Date: 29/Aug/18 19:42
Worklog Time Spent: 10m 
  Work Description: reuvenlax opened a new pull request #6298: [BEAM-4461] 
Introduce Group transform.
URL: https://github.com/apache/beam/pull/6298
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139389)
Time Spent: 2h 40m  (was: 2.5h)

> Create a library of useful transforms that use schemas
> --
>
> Key: BEAM-4461
> URL: https://issues.apache.org/jira/browse/BEAM-4461
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> e.g. JoinBy(fields). Project, Filter, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139406
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 29/Aug/18 19:55
Start Date: 29/Aug/18 19:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r213770593
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// Define common lifecycle tasks and artifact types
+apply plugin: "base"
+
+def dockerImageTag = 'beam-website'
+def dockerWorkDir = "/repo"
+def buildDir = "$project.rootDir/build/website"
+
+task buildDockerImage(type: Exec) {
+  inputs.files 'Gemfile', 'Gemfile.lock'
+  commandLine 'docker', 'build', '-t', dockerImageTag, '.'
+}
+
+task createDockerContainer(type: Exec) {
+  dependsOn buildDockerImage
+  standardOutput = new ByteArrayOutputStream()
+  ext.containerId = {
+return standardOutput.toString().trim()
+  }
+  commandLine '/bin/bash', '-c',
+"docker create -v $project.rootDir:$dockerWorkDir -u \$(id -u):\$(id -g) 
$dockerImageTag"
+}
+
+task startDockerContainer(type: Exec) {
+  dependsOn createDockerContainer
+  ext.containerId = {
+return createDockerContainer.containerId()
+  }
+  commandLine 'docker', 'start',
+"${->createDockerContainer.containerId()}" // Lazily evaluate containerId.
+}
+
+task stopAndRemoveDockerContainer(type: Exec) {
+  commandLine 'docker', 'rm', '-f', "${->createDockerContainer.containerId()}"
+}
+
+task setupBuildDir(type: Copy) {
+  from('.') {
+include 'Gemfile*'
+include 'Rakefile'
+  }
+  into buildDir
+}
+
+task cleanWebsite(type: Delete) {
+  delete buildDir
+}
+clean.dependsOn cleanWebsite
+
+task buildWebsite(type: Exec) {
+  dependsOn startDockerContainer, setupBuildDir
+  finalizedBy stopAndRemoveDockerContainer
+  inputs.files 'Gemfile.lock', '_config.yml'
+  inputs.dir 'src'
+  outputs.dir "$buildDir/.sass-cache"
+  outputs.dir "$buildDir/content"
+  commandLine 'docker', 'exec', '-w', "$dockerWorkDir/build/website",
+"${->startDockerContainer.containerId()}", '/bin/bash', '-c',
+"""bundle exec jekyll build \
+  --config $dockerWorkDir/website/_config.yml \
+  --incremental \
+  --source $dockerWorkDir/website/src
+  """
+}
+build.dependsOn buildWebsite
+
+task testWebsite(type: Exec) {
+  dependsOn startDockerContainer, buildWebsite
 
 Review comment:
   `buildWebsite` already depends on `startDockerContainer`. Is there a reason 
why we can't remove `startDockerContainer` from here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139406)
Time Spent: 5h 10m  (was: 5h)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139410&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139410
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 29/Aug/18 19:55
Start Date: 29/Aug/18 19:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r213812990
 
 

 ##
 File path: website/Rakefile
 ##
 @@ -3,16 +3,18 @@ require 'html-proofer'
 require 'etc'
 
 task :test do
-  FileUtils.rm_rf('./.testcontent')
-  sh "bundle exec jekyll build --config _config.yml,_config_test.yml"
-  HTMLProofer.check_directory("./.testcontent", {
+  HTMLProofer.check_directory("./content", {
 
 Review comment:
   Just of my edification.. what are Rakefiles for?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139410)
Time Spent: 5h 50m  (was: 5h 40m)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139411&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139411
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 29/Aug/18 19:55
Start Date: 29/Aug/18 19:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r213812555
 
 

 ##
 File path: website/Rakefile
 ##
 @@ -3,16 +3,18 @@ require 'html-proofer'
 require 'etc'
 
 
 Review comment:
   This file doesn't need a license?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139411)
Time Spent: 5h 50m  (was: 5h 40m)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139408&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139408
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 29/Aug/18 19:55
Start Date: 29/Aug/18 19:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r213811284
 
 

 ##
 File path: website/Dockerfile
 ##
 @@ -0,0 +1,33 @@
+###
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+###
+
+# This image contains Ruby and dependencies required to build and test the Beam
+# website. It is used by tasks in build.gradle.
+
+FROM ruby:2.5
+
+WORKDIR /ruby
+RUN gem install bundler
+# Update buildDockerImage's inputs.files if you change this list.
+ADD Gemfile Gemfile.lock /ruby/
+RUN bundle install --deployment --path $GEM_HOME
+
+# Required for website testing using HTMLProofer.
+ENV LC_ALL C.UTF-8
+
+CMD sleep 3600
 
 Review comment:
   Does this come w jekyll?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139408)
Time Spent: 5.5h  (was: 5h 20m)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139412&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139412
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 29/Aug/18 19:55
Start Date: 29/Aug/18 19:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6282: [BEAM-4495] Website 
pre-commit job
URL: https://github.com/apache/beam/pull/6282#issuecomment-417084216
 
 
   Added a few comments. Thanks Udi!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139412)
Time Spent: 6h  (was: 5h 50m)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4495) Create website pre-commits for apache/beam repository

2018-08-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4495?focusedWorklogId=139407&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139407
 ]

ASF GitHub Bot logged work on BEAM-4495:


Author: ASF GitHub Bot
Created on: 29/Aug/18 19:55
Start Date: 29/Aug/18 19:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6282: 
[BEAM-4495] Website pre-commit job
URL: https://github.com/apache/beam/pull/6282#discussion_r213773065
 
 

 ##
 File path: website/build.gradle
 ##
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// Define common lifecycle tasks and artifact types
+apply plugin: "base"
+
+def dockerImageTag = 'beam-website'
+def dockerWorkDir = "/repo"
+def buildDir = "$project.rootDir/build/website"
+
+task buildDockerImage(type: Exec) {
+  inputs.files 'Gemfile', 'Gemfile.lock'
+  commandLine 'docker', 'build', '-t', dockerImageTag, '.'
+}
+
+task createDockerContainer(type: Exec) {
+  dependsOn buildDockerImage
+  standardOutput = new ByteArrayOutputStream()
+  ext.containerId = {
+return standardOutput.toString().trim()
+  }
+  commandLine '/bin/bash', '-c',
+"docker create -v $project.rootDir:$dockerWorkDir -u \$(id -u):\$(id -g) 
$dockerImageTag"
+}
+
+task startDockerContainer(type: Exec) {
+  dependsOn createDockerContainer
+  ext.containerId = {
+return createDockerContainer.containerId()
+  }
+  commandLine 'docker', 'start',
+"${->createDockerContainer.containerId()}" // Lazily evaluate containerId.
 
 Review comment:
   This is new to me. Cool : )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139407)
Time Spent: 5h 20m  (was: 5h 10m)

> Create website pre-commits for apache/beam repository
> -
>
> Key: BEAM-4495
> URL: https://issues.apache.org/jira/browse/BEAM-4495
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>  Labels: beam-site-automation-reliability
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >