[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139116
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 06:27
Start Date: 29/Aug/18 06:27
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213557385
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessManager.java
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution.environment;
+
+import static com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.common.collect.ImmutableList;
+import java.io.IOException;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.concurrent.ThreadSafe;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** A simple process manager which forks processes and kills them if 
necessary. */
+@ThreadSafe
+class ProcessManager {
+  private static final Logger LOG = 
LoggerFactory.getLogger(ProcessManager.class);
+
+  private static final ProcessManager INSTANCE = new ProcessManager();
+
+  private final Map processes;
+
+  public static ProcessManager getDefault() {
 
 Review comment:
   `getDefault ` -> `getInstance` as this is the only way to get an instance.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139116)
Time Spent: 50m  (was: 40m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139120
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 06:27
Start Date: 29/Aug/18 06:27
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213520106
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactory.java
 ##
 @@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import com.google.common.collect.ImmutableList;
+import java.time.Duration;
+import java.util.List;
+import java.util.concurrent.TimeoutException;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientPool;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * An {@link EnvironmentFactory} which forks processes based on the given URL 
in the Environment.
+ * The returned {@link ProcessEnvironment} has to make sure to stop the 
processes.
+ */
+public class ProcessEnvironmentFactory implements EnvironmentFactory {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ProcessEnvironmentFactory.class);
+
+  public static ProcessEnvironmentFactory create(
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  IdGenerator idGenerator) {
+return create(
+ProcessManager.getDefault(),
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+clientSource,
+idGenerator);
+  }
+
+  static ProcessEnvironmentFactory create(
+  ProcessManager processManager,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  IdGenerator idGenerator) {
+return new ProcessEnvironmentFactory(
+processManager,
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+idGenerator,
+clientSource);
+  }
+
+  private final ProcessManager processManager;
+  private final GrpcFnServer 
controlServiceServer;
+  private final GrpcFnServer loggingServiceServer;
+  private final GrpcFnServer retrievalServiceServer;
+  private final GrpcFnServer 
provisioningServiceServer;
+  private final IdGenerator idGenerator;
+  private final ControlClientPool.Source clientSource;
+
+  private ProcessEnvironmentFactory(
+  ProcessManager processManager,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  IdGenerator idGenerator,
+  ControlClientPool.Source clientSource) {
+this.processManager = processManager;
+this.controlServiceServer = controlServiceServer;
+this.loggingServiceServer = loggingServiceServer;
+this.retrievalServiceServer = retrievalServiceServer;
+this.provisioningServiceServer = provisioningServiceS

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139119
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 06:27
Start Date: 29/Aug/18 06:27
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213510484
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/JobBundleFactoryBase.java
 ##
 @@ -0,0 +1,331 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.fn.stream.OutboundObserverFactory;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
 
 Review comment:
   Please update the comment to reflect the updated functionality


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139119)
Time Spent: 1h 10m  (was: 1h)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139115
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 06:27
Start Date: 29/Aug/18 06:27
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213554789
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/JobBundleFactoryBase.java
 ##
 @@ -0,0 +1,331 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.fn.stream.OutboundObserverFactory;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public abstract class JobBundleFactoryBase implements JobBundleFactory {
+  protected final Logger logger = LoggerFactory.getLogger(getClass());
 
 Review comment:
   lets use LOG


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139115)
Time Spent: 50m  (w

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139118
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 06:27
Start Date: 29/Aug/18 06:27
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213516103
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DockerJobBundleFactory.java
 ##
 @@ -234,7 +125,7 @@ protected ServerFactory getServerFactory() {
 // We only use the published Docker ports 8100-8200 in a 
round-robin fashion
 () -> MAC_PORT.getAndUpdate(val -> val == MAC_PORT_END ? 
MAC_PORT_START : val + 1));
   default:
-LOG.warn("Unknown Docker platform. Falling back to default server 
factory");
+logger.warn("Unknown Docker platform. Falling back to default server 
factory");
 
 Review comment:
   Lets use LOG instead of logger


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139118)
Time Spent: 1h 10m  (was: 1h)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139117
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 06:27
Start Date: 29/Aug/18 06:27
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213556884
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironmentFactory.java
 ##
 @@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.environment;
+
+import com.google.common.collect.ImmutableList;
+import java.time.Duration;
+import java.util.List;
+import java.util.concurrent.TimeoutException;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import org.apache.beam.runners.fnexecution.control.ControlClientPool;
+import 
org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService;
+import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * An {@link EnvironmentFactory} which forks processes based on the given URL 
in the Environment.
+ * The returned {@link ProcessEnvironment} has to make sure to stop the 
processes.
+ */
+public class ProcessEnvironmentFactory implements EnvironmentFactory {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ProcessEnvironmentFactory.class);
+
+  public static ProcessEnvironmentFactory create(
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  IdGenerator idGenerator) {
+return create(
+ProcessManager.getDefault(),
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+clientSource,
+idGenerator);
+  }
+
+  static ProcessEnvironmentFactory create(
+  ProcessManager processManager,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  ControlClientPool.Source clientSource,
+  IdGenerator idGenerator) {
+return new ProcessEnvironmentFactory(
+processManager,
+controlServiceServer,
+loggingServiceServer,
+retrievalServiceServer,
+provisioningServiceServer,
+idGenerator,
+clientSource);
+  }
+
+  private final ProcessManager processManager;
+  private final GrpcFnServer 
controlServiceServer;
+  private final GrpcFnServer loggingServiceServer;
+  private final GrpcFnServer retrievalServiceServer;
+  private final GrpcFnServer 
provisioningServiceServer;
+  private final IdGenerator idGenerator;
+  private final ControlClientPool.Source clientSource;
+
+  private ProcessEnvironmentFactory(
+  ProcessManager processManager,
+  GrpcFnServer controlServiceServer,
+  GrpcFnServer loggingServiceServer,
+  GrpcFnServer retrievalServiceServer,
+  GrpcFnServer provisioningServiceServer,
+  IdGenerator idGenerator,
+  ControlClientPool.Source clientSource) {
+this.processManager = processManager;
+this.controlServiceServer = controlServiceServer;
+this.loggingServiceServer = loggingServiceServer;
+this.retrievalServiceServer = retrievalServiceServer;
+this.provisioningServiceServer = provisioningServiceS

[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139110
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 06:07
Start Date: 29/Aug/18 06:07
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-416835608
 
 
   @mxm thanks for taking this up!
   
   I used the changes in this PR to illustrate what I would like to accomplish 
as custom extension here: 
https://github.com/tweise/beam/commits/processJobBundleFactory
   
   My goal is to launch the Python SDK harness directly (eliminating boot.go), 
artifact staging can be skipped.
   
   From cursory test it seems error handling in the process launch needs some 
fixing. We also need the ability to propagate the environment of the JVM plus 
set additional environment variables to meet the contract of the Python worker.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139110)
Time Spent: 40m  (was: 0.5h)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[beam] branch master updated: Fix typo.

2018-08-28 Thread thw
This is an automated email from the ASF dual-hosted git repository.

thw pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 9d5c044  Fix typo.
9d5c044 is described below

commit 9d5c0442842ae648c4299846573a81562f141871
Author: Thomas Weise 
AuthorDate: Tue Aug 28 21:48:27 2018 -0700

Fix typo.
---
 .../beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java
 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java
index 8afe49c..4a1a348 100644
--- 
a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java
+++ 
b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java
@@ -196,7 +196,7 @@ public class FlinkStreamingPortablePipelineTranslator
   FlinkStreamingPortablePipelineTranslator.TranslationContext context) {
 throw new IllegalArgumentException(
 String.format(
-"Unknown type of URN %s for PTrasnform with id %s.",
+"Unknown type of URN %s for PTransform with id %s.",
 
pipeline.getComponents().getTransformsOrThrow(id).getSpec().getUrn(), id));
   }
 



[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=139093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139093
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 29/Aug/18 04:20
Start Date: 29/Aug/18 04:20
Worklog Time Spent: 10m 
  Work Description: tweise commented on a change in pull request #6287: 
[BEAM-5187] Add a ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#discussion_r213540661
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/JobBundleFactoryBase.java
 ##
 @@ -0,0 +1,331 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.control;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalNotification;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Iterables;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+import java.io.IOException;
+import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.Target;
+import org.apache.beam.model.pipeline.v1.RunnerApi.Environment;
+import org.apache.beam.runners.core.construction.graph.ExecutableStage;
+import org.apache.beam.runners.fnexecution.GrpcContextHeaderAccessorProvider;
+import org.apache.beam.runners.fnexecution.GrpcFnServer;
+import org.apache.beam.runners.fnexecution.ServerFactory;
+import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService;
+import 
org.apache.beam.runners.fnexecution.control.ProcessBundleDescriptors.ExecutableProcessBundleDescriptor;
+import 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient.BundleProcessor;
+import org.apache.beam.runners.fnexecution.data.GrpcDataService;
+import 
org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.EnvironmentFactory;
+import org.apache.beam.runners.fnexecution.environment.RemoteEnvironment;
+import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService;
+import org.apache.beam.runners.fnexecution.logging.Slf4jLogWriter;
+import org.apache.beam.runners.fnexecution.provisioning.JobInfo;
+import 
org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService;
+import org.apache.beam.runners.fnexecution.state.GrpcStateService;
+import org.apache.beam.runners.fnexecution.state.StateRequestHandler;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.data.FnDataReceiver;
+import org.apache.beam.sdk.fn.stream.OutboundObserverFactory;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link JobBundleFactory} that uses a {@link DockerEnvironmentFactory} for 
environment
+ * management. Note that returned {@link StageBundleFactory stage bundle 
factories} are not
+ * thread-safe. Instead, a new stage factory should be created for each client.
+ */
+@ThreadSafe
+public abstract class JobBundleFactoryBase implements JobBundleFactory {
+  protected final Logger logger = LoggerFactory.getLogger(getClass());
 
 Review comment:
   It would be better for each class to have its own logger (under its class 
name).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

[jira] [Work logged] (BEAM-4904) Beam Dependency Update Request: de.flapdoodle.embed:de.flapdoodle.embed.mongo 2.1.1

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4904?focusedWorklogId=139088&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139088
 ]

ASF GitHub Bot logged work on BEAM-4904:


Author: ASF GitHub Bot
Created on: 29/Aug/18 03:48
Start Date: 29/Aug/18 03:48
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #6281: 
[BEAM-4904][BEAM-4905] Upgrade Flapdoodle OSS dependencies
URL: https://github.com/apache/beam/pull/6281#issuecomment-416814943
 
 
   Hey @jbonofre, could you please take a look at this PR? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139088)
Time Spent: 20m  (was: 10m)

> Beam Dependency Update Request: de.flapdoodle.embed:de.flapdoodle.embed.mongo 
> 2.1.1
> ---
>
> Key: BEAM-4904
> URL: https://issues.apache.org/jira/browse/BEAM-4904
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Chamikara Jayalath
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 2018-07-25 20:23:49.911490
> Please review and upgrade the 
> de.flapdoodle.embed:de.flapdoodle.embed.mongo to the latest version 2.1.1 
>  
> cc: 
> 2018-08-06 12:09:30.976479
> Please review and upgrade the 
> de.flapdoodle.embed:de.flapdoodle.embed.mongo to the latest version 2.1.1 
>  
> cc: 
> 2018-08-13 12:09:48.188897
> Please review and upgrade the 
> de.flapdoodle.embed:de.flapdoodle.embed.mongo to the latest version 2.1.1 
>  
> cc: 
> 2018-08-20 12:12:32.344889
> Please review and upgrade the 
> de.flapdoodle.embed:de.flapdoodle.embed.mongo to the latest version 2.1.1 
>  
> cc: 
> 2018-08-27 12:14:00.846640
> Please review and upgrade the 
> de.flapdoodle.embed:de.flapdoodle.embed.mongo to the latest version 2.1.1 
>  
> cc: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5252) Fix the failing complex type test due to misusing reversed keyword of Calcite

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5252?focusedWorklogId=139085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139085
 ]

ASF GitHub Bot logged work on BEAM-5252:


Author: ASF GitHub Bot
Created on: 29/Aug/18 03:31
Start Date: 29/Aug/18 03:31
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #6290: [BEAM-5252][SQL] 
Improve complext type tests
URL: https://github.com/apache/beam/pull/6290#issuecomment-416812589
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139085)
Time Spent: 1h  (was: 50m)

> Fix the failing complex type test due to misusing reversed keyword of Calcite
> -
>
> Key: BEAM-5252
> URL: https://issues.apache.org/jira/browse/BEAM-5252
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Critical
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4418) Improve gradle integration with IntelliJ

2018-08-28 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595797#comment-16595797
 ] 

Luke Cwik commented on BEAM-4418:
-

Thanks [~rdub], just was providing context. Note that revisiting those 
decisions may be the right thing if we can't develop code well as a community. 
I am really looking for suggestions since I and others are annoyed by the 
current state of things and I haven't been able to come up with any solutions.

> Improve gradle integration with IntelliJ
> 
>
> Key: BEAM-4418
> URL: https://issues.apache.org/jira/browse/BEAM-4418
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Etienne Chauchot
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To be able to work efficiently with gradle, the integration with intelliJ 
> (more common IDE in the community I think) needs to be improved.The aim of 
> this ticket is to gather areas of improvement discovered by people. Feel free 
> to comment on what you discovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4676) Samza runner documentation

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4676?focusedWorklogId=139054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139054
 ]

ASF GitHub Bot logged work on BEAM-4676:


Author: ASF GitHub Bot
Created on: 29/Aug/18 00:44
Start Date: 29/Aug/18 00:44
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #5815: [BEAM-4676] Samza 
Runner docs and capability matrix
URL: https://github.com/apache/beam/pull/5815
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/website/src/_data/capability-matrix.yml 
b/website/src/_data/capability-matrix.yml
index 580a191f48c..bc10ab6 100644
--- a/website/src/_data/capability-matrix.yml
+++ b/website/src/_data/capability-matrix.yml
@@ -30,6 +30,8 @@ columns:
 name: JStorm
   - class: ibmstreams
 name: IBM Streams
+  - class: samza
+name: Apache Samza
 
 categories:
   - description: What is being computed?
@@ -77,6 +79,10 @@ categories:
 l1: 'Yes'
 l2: fully supported
 l3: ''
+  - class: samza
+l1: 'Yes'
+l2: fully supported
+l3: Supported with per-element transformation.
   - name: GroupByKey
 values:
   - class: model
@@ -115,6 +121,10 @@ categories:
 l1: 'Yes'
 l2: fully supported
 l3: ''
+  - class: samza
+l1: 'Yes'
+l2: fully supported
+l3: "Uses Samza's partitionBy for key grouping and Beam's logic 
for window aggregation and triggering."
   - name: Flatten
 values:
   - class: model
@@ -153,6 +163,10 @@ categories:
 l1: 'Yes'
 l2: fully supported
 l3: ''
+  - class: samza
+l1: 'Yes'
+l2: fully supported
+l3: ''
   - name: Combine
 values:
   - class: model
@@ -191,6 +205,10 @@ categories:
 l1: 'Yes'
 l2: fully supported
 l3: ''
+  - class: samza
+l1: 'Yes'
+l2: fully supported
+l3: Use combiner for efficient pre-aggregation.
   - name: Composite Transforms
 values:
   - class: model
@@ -229,6 +247,10 @@ categories:
 l1: 'Partially'
 l2: supported via inlining
 l3: ''
+  - class: samza
+l1: 'Partially'
+l2: supported via inlining
+l3: ''
   - name: Side Inputs
 values:
   - class: model
@@ -267,6 +289,10 @@ categories:
 l1: 'Yes'
 l2: fully supported
 l3: ''
+  - class: samza
+l1: 'Yes'
+l2: fully supported
+l3: Uses Samza's broadcast operator to distribute the side inputs.
   - name: Source API
 values:
   - class: model
@@ -305,6 +331,10 @@ categories:
 l1: 'Yes'
 l2: fully supported
 l3: ''
+  - class: samza
+l1: 'Yes'
+l2: fully supported
+l3: ''
   - name: Splittable DoFn
 values:
   - class: model
@@ -342,7 +372,11 @@ categories:
   - class: ibmstreams
 l1: 'No'
 l2: not implemented
-l3: 
+l3:
+  - class: samza
+l1: 'No'
+l2: not implemented
+l3:
   - name: Metrics
 values:
   - class: model
@@ -381,6 +415,10 @@ categories:
 l1: 'Partially'
 l2: All metrics types are supported.
 l3: Only attempted values are supported. No committed values for 
metrics.
+  - class: samza
+l1: 'Partially'
+l2: Counter and Gauge are supported.
+l3: Only attempted values are supported. No committed values for 
metrics.
   - name: Stateful Processing
 values:
   - class: model
@@ -419,6 +457,10 @@ categories:
 l1: 'Partially'
 l2: non-merging windows
 l3: ''
+  - class: samza
+l1: 'Partially'
+l2: non-merging windows
+l3: 'States are backed up by either rocksDb KV store or in-memory 
hash map, and persist using changelog.'
   - description: Where in event time?
 anchor: where
 color-b: '37d'
@@ -464,6 +506,10 @@ categories:
 l1: 'Yes'
 l2: supported
 l3: ''
+  - class: samza
+l1: 'Yes'
+l2: supported
+l3: ''
   - name: Fixed windows
 values:
   - class: model
@@ -502,6 +548,10 @@ categories:
  

[jira] [Work logged] (BEAM-4676) Samza runner documentation

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4676?focusedWorklogId=139053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139053
 ]

ASF GitHub Bot logged work on BEAM-4676:


Author: ASF GitHub Bot
Created on: 29/Aug/18 00:44
Start Date: 29/Aug/18 00:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #5815: [BEAM-4676] Samza 
Runner docs and capability matrix
URL: https://github.com/apache/beam/pull/5815#issuecomment-416785897
 
 
   SGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139053)
Time Spent: 50m  (was: 40m)

> Samza runner documentation
> --
>
> Key: BEAM-4676
> URL: https://issues.apache.org/jira/browse/BEAM-4676
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-samza
>Reporter: Xinyu Liu
>Assignee: Xinyu Liu
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add the user guide, examples and capability matrix for Samza runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PerformanceTests_HadoopInputFormat #700

2018-08-28 Thread Apache Jenkins Server
See 




[jira] [Work logged] (BEAM-5254) Add Samza Runner translator registrar and refactor config generation

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5254?focusedWorklogId=139049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139049
 ]

ASF GitHub Bot logged work on BEAM-5254:


Author: ASF GitHub Bot
Created on: 29/Aug/18 00:30
Start Date: 29/Aug/18 00:30
Worklog Time Spent: 10m 
  Work Description: xinyuiscool opened a new pull request #6292: 
[BEAM-5254] Add Samza Runner translator registrar and refactor config
URL: https://github.com/apache/beam/pull/6292
 
 
   Add a registrar for transform translators in Samza Runner so we allow 
customized translators. Also refactors the config generation part so it can be 
extended outside open source beam.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139049)
Time Spent: 10m
Remaining Estimate: 0h

> Add Samza Runner translator registrar and refactor config generation
> 
>
> Key: BEAM-5254
> URL: https://issues.apache.org/jira/browse/BEAM-5254
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-samza
>Reporter:

[jira] [Commented] (BEAM-4418) Improve gradle integration with IntelliJ

2018-08-28 Thread Ryan Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595781#comment-16595781
 ] 

Ryan Williams commented on BEAM-4418:
-

That makes sense [~lcwik], I didn't mean to imply that vendoring was the wrong 
decision, just wanted to document where we're at. Thanks for the additional 
context (and the work on vendoring in the first place)!

> Improve gradle integration with IntelliJ
> 
>
> Key: BEAM-4418
> URL: https://issues.apache.org/jira/browse/BEAM-4418
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Etienne Chauchot
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To be able to work efficiently with gradle, the integration with intelliJ 
> (more common IDE in the community I think) needs to be improved.The aim of 
> this ticket is to gather areas of improvement discovered by people. Feel free 
> to comment on what you discovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4676) Samza runner documentation

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4676?focusedWorklogId=139046&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139046
 ]

ASF GitHub Bot logged work on BEAM-4676:


Author: ASF GitHub Bot
Created on: 29/Aug/18 00:25
Start Date: 29/Aug/18 00:25
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #5815: [BEAM-4676] Samza 
Runner docs and capability matrix
URL: https://github.com/apache/beam/pull/5815#issuecomment-416782910
 
 
   This pull request is no longer marked as stale.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139046)
Time Spent: 0.5h  (was: 20m)

> Samza runner documentation
> --
>
> Key: BEAM-4676
> URL: https://issues.apache.org/jira/browse/BEAM-4676
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-samza
>Reporter: Xinyu Liu
>Assignee: Xinyu Liu
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add the user guide, examples and capability matrix for Samza runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4676) Samza runner documentation

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4676?focusedWorklogId=139047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139047
 ]

ASF GitHub Bot logged work on BEAM-4676:


Author: ASF GitHub Bot
Created on: 29/Aug/18 00:25
Start Date: 29/Aug/18 00:25
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #5815: [BEAM-4676] Samza 
Runner docs and capability matrix
URL: https://github.com/apache/beam/pull/5815#issuecomment-416783003
 
 
   Note that the website actually exists on apache/beam-site and has not yet 
migrated to be part of the apache/beam repo.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139047)
Time Spent: 40m  (was: 0.5h)

> Samza runner documentation
> --
>
> Key: BEAM-4676
> URL: https://issues.apache.org/jira/browse/BEAM-4676
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-samza
>Reporter: Xinyu Liu
>Assignee: Xinyu Liu
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add the user guide, examples and capability matrix for Samza runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5247) Remove slf4j-simple binding from dependencies

2018-08-28 Thread Luke Cwik (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-5247:
---

Assignee: Jozef Vilcek  (was: Aljoscha Krettek)

> Remove slf4j-simple binding from dependencies
> -
>
> Key: BEAM-5247
> URL: https://issues.apache.org/jira/browse/BEAM-5247
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jozef Vilcek
>Assignee: Jozef Vilcek
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Flink runner declares a slf4j-simple binding in dependencies. This can break 
> logging of application if they have their own binding and does not exclude 
> this one from beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5254) Add Samza Runner translator registrar and refactor config generation

2018-08-28 Thread Xinyu Liu (JIRA)
Xinyu Liu created BEAM-5254:
---

 Summary: Add Samza Runner translator registrar and refactor config 
generation
 Key: BEAM-5254
 URL: https://issues.apache.org/jira/browse/BEAM-5254
 Project: Beam
  Issue Type: Improvement
  Components: runner-samza
Reporter: Xinyu Liu
Assignee: Xinyu Liu


Add a registrar for transform translators in Samza Runner so we allow 
customized translators. Also refactors the config generation part so it can be 
extended outside open source beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4418) Improve gradle integration with IntelliJ

2018-08-28 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595749#comment-16595749
 ] 

Luke Cwik commented on BEAM-4418:
-

There is a lot of complexity in shading such that the Java artifacts produced 
are compatible with as many runners as possible and simplify dependency 
requirements for users.

I'm all ears for how to improve but vendoring has signficant tangible benefits 
in cleaning up the dependencies we export. I'm all ears for different proposals 
as vendoring has been discussed and agreed upon twice in the past: 

[https://lists.apache.org/thread.html/12383d2e5d70026427df43294e30d6524334e16f03d86c9a5860792f@%3Cdev.beam.apache.org%3E]

[https://lists.apache.org/thread.html/8b9b3768adfc40d3527d1ce5e8a51d90e5782a348a3abfb9e5dc85ef@%3Cdev.beam.apache.org%3E]

 

> Improve gradle integration with IntelliJ
> 
>
> Key: BEAM-4418
> URL: https://issues.apache.org/jira/browse/BEAM-4418
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Etienne Chauchot
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To be able to work efficiently with gradle, the integration with intelliJ 
> (more common IDE in the community I think) needs to be improved.The aim of 
> this ticket is to gather areas of improvement discovered by people. Feel free 
> to comment on what you discovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139035&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139035
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 28/Aug/18 23:18
Start Date: 28/Aug/18 23:18
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r213502218
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
 ##
 @@ -167,4 +167,11 @@
   Boolean isShutdownSourcesOnFinalWatermark();
 
   void setShutdownSourcesOnFinalWatermark(Boolean shutdownOnFinalWatermark);
+
+  @Description(
+  "Interval in milliseconds for sending latency tracking marks from the 
sources to the sinks.")
+  @Default.Long(-1L)
+  Long getLatencyTrackingInterval();
 
 Review comment:
   This looks like a global debug option and not a flink specific option. Lets 
move it to some thing like Tracking/PerformanceOption


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139035)
Time Spent: 0.5h  (was: 20m)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5239) Allow configure latencyTrackingInterval

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5239?focusedWorklogId=139036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139036
 ]

ASF GitHub Bot logged work on BEAM-5239:


Author: ASF GitHub Bot
Created on: 28/Aug/18 23:18
Start Date: 28/Aug/18 23:18
Worklog Time Spent: 10m 
  Work Description: angoenka commented on a change in pull request #6278: 
[BEAM-5239] Enable to configure latencyTrackingInterval
URL: https://github.com/apache/beam/pull/6278#discussion_r213501162
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java
 ##
 @@ -171,4 +176,12 @@ public static StreamExecutionEnvironment 
createStreamExecutionEnvironment(
 
 return flinkStreamEnv;
   }
+
+  private static void applyLatencyTrackingInterval(
+  ExecutionConfig config, FlinkPipelineOptions options) {
+long latencyTrackingInterval = options.getLatencyTrackingInterval();
+if (latencyTrackingInterval != -1) {
 
 Review comment:
   nit: `latencyTrackingInterval > 0`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139036)

> Allow configure latencyTrackingInterval
> ---
>
> Key: BEAM-5239
> URL: https://issues.apache.org/jira/browse/BEAM-5239
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Because of FLINK-10226, we need to be able to set 
> latencyTrackingConfiguration for flink via FlinkPipelineOptions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5253) Go SDK PubSub example currently broken

2018-08-28 Thread Sean Patrick Hagen (JIRA)
Sean Patrick Hagen created BEAM-5253:


 Summary: Go SDK PubSub example currently broken
 Key: BEAM-5253
 URL: https://issues.apache.org/jira/browse/BEAM-5253
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Sean Patrick Hagen
Assignee: Henning Rohde


The Go SDK contains an example for creating a streaming pipeline that reads 
from pubsub and outputs the messages. It can be found here: 
[https://github.com/apache/beam/blob/master/sdks/go/examples/streaming_wordcap/wordcap.go]

 

This example is broken and does not work. It fails with the error "failed to 
execute job: translation failed: no root units" when I try to run it with the 
direct runner, and it just fails with "Internal Issue (8ed815a0a259018f): 
65177287:8503 " when run in Google Dataflow.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=139013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139013
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 22:21
Start Date: 28/Aug/18 22:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6277: 
[BEAM-5240] Add metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#discussion_r213489767
 
 

 ##
 File path: .test-infra/metrics/sync/jenkins/syncjenkins.py
 ##
 @@ -0,0 +1,192 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+
+# Queries Jenkins to collect metrics and pu them in bigquery.
+import time
+import requests
+import psycopg2
+import os
+import re
+from datetime import datetime, timedelta
+import sys
+from xml.etree import ElementTree
+
+
+# Keeping this as reference for localhost debug
+# Fetching docker host machine ip for testing purposes.
+# Actual host should be used for production.
+# import subprocess
+# cmd_out = subprocess.check_output(["ip", "route", "show"]).decode("utf-8")
+# host = cmd_out.split(" ")[2]
+
+host = os.environ['JENSYNC_HOST']
+port = os.environ['JENSYNC_PORT']
+dbname = os.environ['JENSYNC_DBNAME']
+dbusername = os.environ['JENSYNC_DBUSERNAME']
+dbpassword = os.environ['JENSYNC_DBPWD']
+
+jenkinsBuildsTableName = 'jenkins_builds'
+
+jenkinsJobsCreateTableQuery = f"""
+create table {jenkinsBuildsTableName} (
+job_name varchar NOT NULL,
+build_id integer NOT NULL,
+build_url varchar,
+build_result varchar,
+build_timestamp TIMESTAMP,
+build_builtOn varchar,
+build_duration integer,
+build_estimatedDuration integer,
+build_fullDisplayName varchar,
+timing_blockedDurationMillis integer,
+timing_buildableDurationMillis integer,
+timing_buildingDurationMillis integer,
+timing_executingTimeMillis integer,
+timing_queuingDurationMillis integer,
+timing_totalDurationMillis integer,
+timing_waitingDurationMillis integer,
+primary key(job_name, build_id)
+)
+"""
+
+# returns (jobName, lastBuildId, jobUrl)
+def fetchJobs():
+  url = 
'https://builds.apache.org/view/A-D/view/Beam/api/json?tree=jobs[name,url,lastCompletedBuild[id]]&depth=1'
+  r = requests.get(url)
+  jobs = r.json()[u'jobs']
+  print("---")
+  print(jobs)
+  result = map(lambda x: (x['name'], int(x['lastCompletedBuild']['id']) if 
x['lastCompletedBuild'] is not None else -1, x['url']), jobs)
+  return result
+
+def initConnection():
+  conn = psycopg2.connect(f"dbname='{dbname}' user='{dbusername}' 
host='{host}' port='{port}' password='{dbpassword}'")
+  return conn
+
+def tableExists(cursor, tableName):
+  cursor.execute(f"select * from information_schema.tables where 
table_name='{tableName}';")
+  return bool(cursor.rowcount)
+
+def createTable(connection, tableName, tableSchema):
 
 Review comment:
   This function is never called I believe?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139013)
Time Spent: 1h 10m  (was: 1h)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=139010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139010
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 22:21
Start Date: 28/Aug/18 22:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6277: 
[BEAM-5240] Add metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#discussion_r213489437
 
 

 ##
 File path: .test-infra/metrics/sync/jenkins/syncjenkins.py
 ##
 @@ -0,0 +1,192 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+
+# Queries Jenkins to collect metrics and pu them in bigquery.
+import time
+import requests
+import psycopg2
+import os
+import re
+from datetime import datetime, timedelta
+import sys
+from xml.etree import ElementTree
+
+
+# Keeping this as reference for localhost debug
+# Fetching docker host machine ip for testing purposes.
+# Actual host should be used for production.
+# import subprocess
+# cmd_out = subprocess.check_output(["ip", "route", "show"]).decode("utf-8")
+# host = cmd_out.split(" ")[2]
+
+host = os.environ['JENSYNC_HOST']
+port = os.environ['JENSYNC_PORT']
+dbname = os.environ['JENSYNC_DBNAME']
+dbusername = os.environ['JENSYNC_DBUSERNAME']
+dbpassword = os.environ['JENSYNC_DBPWD']
+
+jenkinsBuildsTableName = 'jenkins_builds'
+
+jenkinsJobsCreateTableQuery = f"""
+create table {jenkinsBuildsTableName} (
+job_name varchar NOT NULL,
+build_id integer NOT NULL,
+build_url varchar,
+build_result varchar,
+build_timestamp TIMESTAMP,
+build_builtOn varchar,
+build_duration integer,
+build_estimatedDuration integer,
+build_fullDisplayName varchar,
+timing_blockedDurationMillis integer,
+timing_buildableDurationMillis integer,
+timing_buildingDurationMillis integer,
+timing_executingTimeMillis integer,
+timing_queuingDurationMillis integer,
+timing_totalDurationMillis integer,
+timing_waitingDurationMillis integer,
+primary key(job_name, build_id)
+)
+"""
+
+# returns (jobName, lastBuildId, jobUrl)
+def fetchJobs():
+  url = 
'https://builds.apache.org/view/A-D/view/Beam/api/json?tree=jobs[name,url,lastCompletedBuild[id]]&depth=1'
+  r = requests.get(url)
+  jobs = r.json()[u'jobs']
+  print("---")
+  print(jobs)
+  result = map(lambda x: (x['name'], int(x['lastCompletedBuild']['id']) if 
x['lastCompletedBuild'] is not None else -1, x['url']), jobs)
+  return result
+
+def initConnection():
+  conn = psycopg2.connect(f"dbname='{dbname}' user='{dbusername}' 
host='{host}' port='{port}' password='{dbpassword}'")
+  return conn
+
+def tableExists(cursor, tableName):
+  cursor.execute(f"select * from information_schema.tables where 
table_name='{tableName}';")
+  return bool(cursor.rowcount)
+
+def createTable(connection, tableName, tableSchema):
+  cursor = connection.cursor()
+  cmd = f"create table {tableName} ({tableSchema});"
+  cursor.execute(cmd)
+  cursor.close()
+  connection.commit()
+  return bool(cursor.rowcount)
+
+
+def initDbTablesIfNeeded():
+  connection = initConnection()
+  cursor = connection.cursor()
+
+  res = tableExists(cursor, jenkinsBuildsTableName)
+  if not res:
+cursor.execute(jenkinsJobsCreateTableQuery)
+if not bool(cursor.rowcount):
+  raise Exception(f"Failed to create table {jenkinsBuildsTableName}")
+
+  cursor.close()
+  connection.commit()
+
+  connection.close()
+
+def fetchSyncedJobsBuildVersions(cursor):
+  fetchQuery = f'''
+  select job_name, max(build_id)
+  from {jenkinsBuildsTableName}
+  group by job_name
+  '''
+
+  cursor.execute(fetchQuery)
+  return dict(cursor.fetchall())
+
+def fetchBuildsForJob(jobUrl):
+  durFields = 
'blockedDurationMillis,buildableDurationMillis,buildingDurationMillis,executingTimeMillis,queuingDurationMillis,totalDurationMillis,waitingDurationMillis'
+  fields = 
f'result,timestamp,id,url,builtOn,building,duration,estimatedDuration,fullDisplayName,actions[{durFields}]'
+  url = f'{jobUrl}api/json?depth=1&tree=builds[{fields}]'
+  r = requests.get(url)
+  return r.json()[u'builds']
+
+def buildRowValuesArray(jobName, build):
+  timings = next((x for x in build[u'actions'] if (u'_class' in x) and 
(x[u'_class'] == u'jenkins.metrics.impl.TimeInQueueAction')), None)
+  values = [jobName,
+  int(build[u'id']),
+  build[u'url'],
+  build[u'result'],
+  datetime.fromtimestamp(build[u'timestamp'] / 1000),
+  buil

[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=139009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139009
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 22:21
Start Date: 28/Aug/18 22:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6277: 
[BEAM-5240] Add metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#discussion_r213486192
 
 

 ##
 File path: .test-infra/metrics/README.md
 ##
 @@ -0,0 +1,106 @@
+# BeamMonitoring
+This folder contains files required to spin-up metrics dashboard for Beam.
+
+## Utilized technologies
+* [Grafana](https://https://grafana.com) as dashboarding engine.
 
 Review comment:
   There's `https://https://` here


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139009)
Time Spent: 1h  (was: 50m)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=139008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139008
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 22:21
Start Date: 28/Aug/18 22:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6277: 
[BEAM-5240] Add metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#discussion_r213486818
 
 

 ##
 File path: .test-infra/metrics/README.md
 ##
 @@ -0,0 +1,106 @@
+# BeamMonitoring
+This folder contains files required to spin-up metrics dashboard for Beam.
+
+## Utilized technologies
+* [Grafana](https://https://grafana.com) as dashboarding engine.
+* PostgreSQL as underlying DB.
+
+Approach utilized is to fetch data from corresponding system: 
Jenkins/Jira/GithubArchives/etc, put it into PostreSQL and fetch it to show in 
Grafana.
+
+## Local setup
+
+Install docker
+* install docker
+* https://docs.docker.com/install/#supported-platforms
+* install docker-compose
+* https://docs.docker.com/compose/install/#install-compose
+
+```sh
+# Remove old docker
+sudo apt-get remove docker docker-engine docker.io
+
+# Install docker
+sudo apt-get update
+sudo apt-get install \
+ apt-transport-https \
+ ca-certificates \
+ curl \
+ gnupg2 \
+ software-properties-common
+curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
+sudo apt-key fingerprint 0EBFCD88
+sudo add-apt-repository \
+   "deb [arch=amd64] https://download.docker.com/linux/debian \
+   $(lsb_release -cs) \
+   stable"
+sudo apt-get update
+sudo apt-get install docker-ce
+
+# Install docker-compose
+sudo curl -L 
https://github.com/docker/compose/releases/download/1.22.0/docker-compose-$(uname
 -s)-$(uname -m) -o /usr/local/bin/docker-compose
+sudo chmod +x /usr/local/bin/docker-compose
+
+# start docker service if it is not running already
+sudo service docker start
+```
+
+## Kubernetes setup
+
+1. Configure gcloud & kubectl
+  * https://cloud.google.com/kubernetes-engine/docs/quickstart
+2. Configure PosgreSQL
+a. 
https://pantheon.corp.google.com/sql/instances?project=apache-beam-testing
+b. Check on this link to configure connection from kubernetes to 
postgresql: https://cloud.google.com/sql/docs/postgres/connect-kubernetes-engine
+3. add secrets for grafana
+a. `kubectl create secret generic grafana-admin-pwd 
--from-literal=grafana_admin_password=`
+4. create persistent volume claims:
+```sh
+kubectl create -f beam-grafana-etcdata-persistentvolumeclaim.yaml
+kubectl create -f beam-grafana-libdata-persistentvolumeclaim.yaml
+kubectl create -f beam-grafana-logdata-persistentvolumeclaim.yaml
 
 Review comment:
   These are after cloning the beam repo right? Maybe mention it? (not sure if 
it's necessary..., but idk, up to you)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139008)
Time Spent: 1h  (was: 50m)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=139012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139012
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 22:21
Start Date: 28/Aug/18 22:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6277: 
[BEAM-5240] Add metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#discussion_r213488383
 
 

 ##
 File path: .test-infra/metrics/sync/jenkins/syncjenkins.py
 ##
 @@ -0,0 +1,192 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+
+# Queries Jenkins to collect metrics and pu them in bigquery.
+import time
+import requests
+import psycopg2
+import os
+import re
+from datetime import datetime, timedelta
+import sys
+from xml.etree import ElementTree
+
+
+# Keeping this as reference for localhost debug
+# Fetching docker host machine ip for testing purposes.
+# Actual host should be used for production.
+# import subprocess
+# cmd_out = subprocess.check_output(["ip", "route", "show"]).decode("utf-8")
+# host = cmd_out.split(" ")[2]
+
+host = os.environ['JENSYNC_HOST']
+port = os.environ['JENSYNC_PORT']
+dbname = os.environ['JENSYNC_DBNAME']
+dbusername = os.environ['JENSYNC_DBUSERNAME']
+dbpassword = os.environ['JENSYNC_DBPWD']
+
+jenkinsBuildsTableName = 'jenkins_builds'
+
+jenkinsJobsCreateTableQuery = f"""
+create table {jenkinsBuildsTableName} (
+job_name varchar NOT NULL,
+build_id integer NOT NULL,
+build_url varchar,
+build_result varchar,
+build_timestamp TIMESTAMP,
+build_builtOn varchar,
+build_duration integer,
+build_estimatedDuration integer,
+build_fullDisplayName varchar,
+timing_blockedDurationMillis integer,
+timing_buildableDurationMillis integer,
+timing_buildingDurationMillis integer,
+timing_executingTimeMillis integer,
+timing_queuingDurationMillis integer,
+timing_totalDurationMillis integer,
+timing_waitingDurationMillis integer,
+primary key(job_name, build_id)
+)
+"""
+
+# returns (jobName, lastBuildId, jobUrl)
+def fetchJobs():
+  url = 
'https://builds.apache.org/view/A-D/view/Beam/api/json?tree=jobs[name,url,lastCompletedBuild[id]]&depth=1'
+  r = requests.get(url)
+  jobs = r.json()[u'jobs']
+  print("---")
+  print(jobs)
+  result = map(lambda x: (x['name'], int(x['lastCompletedBuild']['id']) if 
x['lastCompletedBuild'] is not None else -1, x['url']), jobs)
+  return result
+
+def initConnection():
+  conn = psycopg2.connect(f"dbname='{dbname}' user='{dbusername}' 
host='{host}' port='{port}' password='{dbpassword}'")
+  return conn
+
+def tableExists(cursor, tableName):
+  cursor.execute(f"select * from information_schema.tables where 
table_name='{tableName}';")
+  return bool(cursor.rowcount)
+
+def createTable(connection, tableName, tableSchema):
+  cursor = connection.cursor()
+  cmd = f"create table {tableName} ({tableSchema});"
+  cursor.execute(cmd)
+  cursor.close()
+  connection.commit()
+  return bool(cursor.rowcount)
+
+
+def initDbTablesIfNeeded():
+  connection = initConnection()
+  cursor = connection.cursor()
+
+  res = tableExists(cursor, jenkinsBuildsTableName)
+  if not res:
+cursor.execute(jenkinsJobsCreateTableQuery)
+if not bool(cursor.rowcount):
+  raise Exception(f"Failed to create table {jenkinsBuildsTableName}")
+
+  cursor.close()
+  connection.commit()
+
+  connection.close()
+
+def fetchSyncedJobsBuildVersions(cursor):
+  fetchQuery = f'''
+  select job_name, max(build_id)
+  from {jenkinsBuildsTableName}
+  group by job_name
+  '''
+
+  cursor.execute(fetchQuery)
+  return dict(cursor.fetchall())
+
+def fetchBuildsForJob(jobUrl):
+  durFields = 
'blockedDurationMillis,buildableDurationMillis,buildingDurationMillis,executingTimeMillis,queuingDurationMillis,totalDurationMillis,waitingDurationMillis'
+  fields = 
f'result,timestamp,id,url,builtOn,building,duration,estimatedDuration,fullDisplayName,actions[{durFields}]'
+  url = f'{jobUrl}api/json?depth=1&tree=builds[{fields}]'
+  r = requests.get(url)
+  return r.json()[u'builds']
+
+def buildRowValuesArray(jobName, build):
+  timings = next((x for x in build[u'actions'] if (u'_class' in x) and 
(x[u'_class'] == u'jenkins.metrics.impl.TimeInQueueAction')), None)
+  values = [jobName,
+  int(build[u'id']),
+  build[u'url'],
+  build[u'result'],
+  datetime.fromtimestamp(build[u'timestamp'] / 1000),
+  buil

[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=139011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139011
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 22:21
Start Date: 28/Aug/18 22:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6277: 
[BEAM-5240] Add metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#discussion_r213490767
 
 

 ##
 File path: .test-infra/metrics/sync/jenkins/syncjenkins.py
 ##
 @@ -0,0 +1,192 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+
+# Queries Jenkins to collect metrics and pu them in bigquery.
+import time
+import requests
+import psycopg2
+import os
+import re
+from datetime import datetime, timedelta
+import sys
+from xml.etree import ElementTree
+
+
+# Keeping this as reference for localhost debug
+# Fetching docker host machine ip for testing purposes.
+# Actual host should be used for production.
+# import subprocess
+# cmd_out = subprocess.check_output(["ip", "route", "show"]).decode("utf-8")
+# host = cmd_out.split(" ")[2]
+
+host = os.environ['JENSYNC_HOST']
+port = os.environ['JENSYNC_PORT']
+dbname = os.environ['JENSYNC_DBNAME']
+dbusername = os.environ['JENSYNC_DBUSERNAME']
+dbpassword = os.environ['JENSYNC_DBPWD']
+
+jenkinsBuildsTableName = 'jenkins_builds'
+
+jenkinsJobsCreateTableQuery = f"""
+create table {jenkinsBuildsTableName} (
+job_name varchar NOT NULL,
+build_id integer NOT NULL,
+build_url varchar,
+build_result varchar,
+build_timestamp TIMESTAMP,
+build_builtOn varchar,
+build_duration integer,
+build_estimatedDuration integer,
+build_fullDisplayName varchar,
+timing_blockedDurationMillis integer,
+timing_buildableDurationMillis integer,
+timing_buildingDurationMillis integer,
+timing_executingTimeMillis integer,
+timing_queuingDurationMillis integer,
+timing_totalDurationMillis integer,
+timing_waitingDurationMillis integer,
+primary key(job_name, build_id)
+)
+"""
+
+# returns (jobName, lastBuildId, jobUrl)
+def fetchJobs():
+  url = 
'https://builds.apache.org/view/A-D/view/Beam/api/json?tree=jobs[name,url,lastCompletedBuild[id]]&depth=1'
+  r = requests.get(url)
+  jobs = r.json()[u'jobs']
+  print("---")
+  print(jobs)
+  result = map(lambda x: (x['name'], int(x['lastCompletedBuild']['id']) if 
x['lastCompletedBuild'] is not None else -1, x['url']), jobs)
+  return result
+
+def initConnection():
+  conn = psycopg2.connect(f"dbname='{dbname}' user='{dbusername}' 
host='{host}' port='{port}' password='{dbpassword}'")
+  return conn
+
+def tableExists(cursor, tableName):
+  cursor.execute(f"select * from information_schema.tables where 
table_name='{tableName}';")
+  return bool(cursor.rowcount)
+
+def createTable(connection, tableName, tableSchema):
+  cursor = connection.cursor()
+  cmd = f"create table {tableName} ({tableSchema});"
+  cursor.execute(cmd)
+  cursor.close()
+  connection.commit()
+  return bool(cursor.rowcount)
+
+
+def initDbTablesIfNeeded():
+  connection = initConnection()
+  cursor = connection.cursor()
+
+  res = tableExists(cursor, jenkinsBuildsTableName)
+  if not res:
+cursor.execute(jenkinsJobsCreateTableQuery)
+if not bool(cursor.rowcount):
+  raise Exception(f"Failed to create table {jenkinsBuildsTableName}")
+
+  cursor.close()
+  connection.commit()
+
+  connection.close()
+
+def fetchSyncedJobsBuildVersions(cursor):
+  fetchQuery = f'''
+  select job_name, max(build_id)
+  from {jenkinsBuildsTableName}
+  group by job_name
+  '''
+
+  cursor.execute(fetchQuery)
+  return dict(cursor.fetchall())
+
+def fetchBuildsForJob(jobUrl):
+  durFields = 
'blockedDurationMillis,buildableDurationMillis,buildingDurationMillis,executingTimeMillis,queuingDurationMillis,totalDurationMillis,waitingDurationMillis'
+  fields = 
f'result,timestamp,id,url,builtOn,building,duration,estimatedDuration,fullDisplayName,actions[{durFields}]'
+  url = f'{jobUrl}api/json?depth=1&tree=builds[{fields}]'
+  r = requests.get(url)
+  return r.json()[u'builds']
+
+def buildRowValuesArray(jobName, build):
+  timings = next((x for x in build[u'actions'] if (u'_class' in x) and 
(x[u'_class'] == u'jenkins.metrics.impl.TimeInQueueAction')), None)
+  values = [jobName,
+  int(build[u'id']),
+  build[u'url'],
+  build[u'result'],
+  datetime.fromtimestamp(build[u'timestamp'] / 1000),
+  buil

[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=139007&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139007
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 22:21
Start Date: 28/Aug/18 22:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6277: 
[BEAM-5240] Add metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#discussion_r213490987
 
 

 ##
 File path: .test-infra/metrics/sync/jenkins/syncjenkins.py
 ##
 @@ -0,0 +1,192 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+
+# Queries Jenkins to collect metrics and pu them in bigquery.
+import time
+import requests
+import psycopg2
+import os
+import re
+from datetime import datetime, timedelta
+import sys
+from xml.etree import ElementTree
+
+
+# Keeping this as reference for localhost debug
+# Fetching docker host machine ip for testing purposes.
+# Actual host should be used for production.
+# import subprocess
+# cmd_out = subprocess.check_output(["ip", "route", "show"]).decode("utf-8")
+# host = cmd_out.split(" ")[2]
+
+host = os.environ['JENSYNC_HOST']
+port = os.environ['JENSYNC_PORT']
+dbname = os.environ['JENSYNC_DBNAME']
+dbusername = os.environ['JENSYNC_DBUSERNAME']
+dbpassword = os.environ['JENSYNC_DBPWD']
+
+jenkinsBuildsTableName = 'jenkins_builds'
+
+jenkinsJobsCreateTableQuery = f"""
+create table {jenkinsBuildsTableName} (
+job_name varchar NOT NULL,
+build_id integer NOT NULL,
+build_url varchar,
+build_result varchar,
+build_timestamp TIMESTAMP,
+build_builtOn varchar,
+build_duration integer,
+build_estimatedDuration integer,
+build_fullDisplayName varchar,
+timing_blockedDurationMillis integer,
+timing_buildableDurationMillis integer,
+timing_buildingDurationMillis integer,
+timing_executingTimeMillis integer,
+timing_queuingDurationMillis integer,
+timing_totalDurationMillis integer,
+timing_waitingDurationMillis integer,
+primary key(job_name, build_id)
+)
+"""
+
+# returns (jobName, lastBuildId, jobUrl)
 
 Review comment:
   Remove>?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139007)
Time Spent: 50m  (was: 40m)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=139014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139014
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 22:21
Start Date: 28/Aug/18 22:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #6277: 
[BEAM-5240] Add metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#discussion_r213490494
 
 

 ##
 File path: .test-infra/metrics/sync/jenkins/syncjenkins.py
 ##
 @@ -0,0 +1,192 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+
+# Queries Jenkins to collect metrics and pu them in bigquery.
+import time
+import requests
+import psycopg2
+import os
+import re
+from datetime import datetime, timedelta
+import sys
+from xml.etree import ElementTree
+
+
+# Keeping this as reference for localhost debug
+# Fetching docker host machine ip for testing purposes.
+# Actual host should be used for production.
+# import subprocess
+# cmd_out = subprocess.check_output(["ip", "route", "show"]).decode("utf-8")
+# host = cmd_out.split(" ")[2]
+
+host = os.environ['JENSYNC_HOST']
+port = os.environ['JENSYNC_PORT']
+dbname = os.environ['JENSYNC_DBNAME']
+dbusername = os.environ['JENSYNC_DBUSERNAME']
+dbpassword = os.environ['JENSYNC_DBPWD']
+
+jenkinsBuildsTableName = 'jenkins_builds'
+
+jenkinsJobsCreateTableQuery = f"""
+create table {jenkinsBuildsTableName} (
+job_name varchar NOT NULL,
+build_id integer NOT NULL,
+build_url varchar,
+build_result varchar,
+build_timestamp TIMESTAMP,
+build_builtOn varchar,
+build_duration integer,
+build_estimatedDuration integer,
+build_fullDisplayName varchar,
+timing_blockedDurationMillis integer,
+timing_buildableDurationMillis integer,
+timing_buildingDurationMillis integer,
+timing_executingTimeMillis integer,
+timing_queuingDurationMillis integer,
+timing_totalDurationMillis integer,
+timing_waitingDurationMillis integer,
+primary key(job_name, build_id)
+)
+"""
+
+# returns (jobName, lastBuildId, jobUrl)
+def fetchJobs():
+  url = 
'https://builds.apache.org/view/A-D/view/Beam/api/json?tree=jobs[name,url,lastCompletedBuild[id]]&depth=1'
+  r = requests.get(url)
+  jobs = r.json()[u'jobs']
+  print("---")
+  print(jobs)
+  result = map(lambda x: (x['name'], int(x['lastCompletedBuild']['id']) if 
x['lastCompletedBuild'] is not None else -1, x['url']), jobs)
 
 Review comment:
   Maybe format this line a bit? : )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 139014)
Time Spent: 1h 10m  (was: 1h)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=138969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138969
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 28/Aug/18 19:47
Start Date: 28/Aug/18 19:47
Worklog Time Spent: 10m 
  Work Description: timrobertson100 edited a comment on issue #6289: 
[BEAM-5036] Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-416716699
 
 
   I should add I opted to ask you to review as I think this needs 
consideration by someone who with experience in the FileSystem implementations. 
I'm happy if we want to wait until after 2.7.0 is cut, so we have the full 6 
week of testing in 2.8.0 to flush out any issues. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138969)
Time Spent: 1h  (was: 50m)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=138968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138968
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 28/Aug/18 19:47
Start Date: 28/Aug/18 19:47
Worklog Time Spent: 10m 
  Work Description: timrobertson100 edited a comment on issue #6289: 
[BEAM-5036] Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-416716699
 
 
   I should add I opted to ask you to review as I think this needs 
consideration by someone who with experience in the FileSystem implementations. 
I'm happy if we want to wait until the 2.7.0 is cut, so we have the full 6 week 
of testing in 2.8.0 to flush out any issues. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138968)
Time Spent: 50m  (was: 40m)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=138966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138966
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 28/Aug/18 19:46
Start Date: 28/Aug/18 19:46
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6289: [BEAM-5036] 
Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-416716699
 
 
   I should add I opted to ask you to review as I think this needs 
consideration by someone who might have experience in the FileSystem 
implementations. I'm happy if we want to wait until the 2.7.0 is cut, so we 
have the full 6 week of testing in 2.8.0 to flush out any issues. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138966)
Time Spent: 40m  (was: 0.5h)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=138965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138965
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 28/Aug/18 19:44
Start Date: 28/Aug/18 19:44
Worklog Time Spent: 10m 
  Work Description: timrobertson100 commented on issue #6289: [BEAM-5036] 
Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-416715865
 
 
   Only on HDFS @reuvenlax 
   
   This is the change I was testing from the discussion we recently had on 
[dev@](https://lists.apache.org/thread.html/b904779aefec1b6d01d28a492f626075de009a71587e1b9df3aa0f2b@%3Cdev.beam.apache.org%3E)
   
   Repeating here - on a 10 node YARN CDH 5.12.2 cluster, rewriting a 1.5TB 
AvroIO file (code 
[here](https://github.com/gbif/beam-perf/tree/master/avro-to-avro)) I observed:
   
   ```
 - Spark API: 35 minutes
 - Beam AvroIO (2.6.0): 1.7hrs
 - Beam AvroIO with this rename() patch: 42 minutes
   ```
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138965)
Time Spent: 0.5h  (was: 20m)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=138964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138964
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 28/Aug/18 19:36
Start Date: 28/Aug/18 19:36
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #6289: [BEAM-5036] 
Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289#issuecomment-416713708
 
 
   Were you able to test the performance improvement here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138964)
Time Spent: 20m  (was: 10m)

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=138960&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138960
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 19:23
Start Date: 28/Aug/18 19:23
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on a change in pull request #6277: 
[BEAM-5240] Add metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#discussion_r213441343
 
 

 ##
 File path: .test-infra/metrics/sync/jenkins/syncjenkins.py
 ##
 @@ -0,0 +1,192 @@
+#
 
 Review comment:
   That's valid point. Do you know how we can run linter on specific files?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138960)
Time Spent: 40m  (was: 0.5h)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : beam_PerformanceTests_XmlIOIT_HDFS #598

2018-08-28 Thread Apache Jenkins Server
See 




Build failed in Jenkins: beam_PerformanceTests_HadoopInputFormat #699

2018-08-28 Thread Apache Jenkins Server
See 


--
[...truncated 522.53 KB...]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.postgresql.util.PSQLException: 
The connection attempt failed.
at 
org.apache.hadoop.mapreduce.lib.db.DBInputFormat.createConnection(DBInputFormat.java:205)
at 
org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:164)
... 18 more
Caused by: org.postgresql.util.PSQLException: The connection attempt failed.
at 
org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:257)
at 
org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.(PgConnection.java:195)
at org.postgresql.Driver.makeConnection(Driver.java:452)
at org.postgresql.Driver.connect(Driver.java:254)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at 
org.apache.hadoop.mapreduce.lib.db.DBConfiguration.getConnection(DBConfiguration.java:154)
at 
org.apache.hadoop.mapreduce.lib.db.DBInputFormat.createConnection(DBInputFormat.java:198)
... 19 more
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.postgresql.core.PGStream.(PGStream.java:69)
at 
org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:156)
... 27 more
java.lang.RuntimeException: java.lang.RuntimeException: 
org.postgresql.util.PSQLException: The connection attempt failed.
at 
org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:171)
at 
org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO$HadoopInputFormatBoundedSource.createInputFormatInstance(HadoopInputFormatIO.java:571)
at 
org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO$HadoopInputFormatBoundedSource.computeSplitsIfNecessary(HadoopInputFormatIO.java:527)
at 
org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIO$HadoopInputFormatBoundedSource.split(HadoopInputFormatIO.java:487)
at 
com.google.cloud.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:275)
at 
com.google.cloud.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:197)
at 
com.google.cloud.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:181)
at 
com.google.cloud.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:160)
at 
com.google.cloud.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:77)
at 
com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:393)
at 
com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:362)
at 
com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:290)
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
at 
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.postgresql.util.PSQLException: 
The connection attempt failed.
at 
org.apache.hadoop.mapreduce.lib.db.DBInputFormat.createConnection(DBInputFormat.java:205)
at 
org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:164)
... 18 more
Caused by: org.postgresql.util.PSQLException: 

[jira] [Resolved] (BEAM-5143) Stop showing dependencies which are not able to upgraded in the weekly report

2018-08-28 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-5143.
-
   Resolution: Fixed
Fix Version/s: 2.7.0

> Stop showing dependencies which are not able to upgraded in the weekly report
> -
>
> Key: BEAM-5143
> URL: https://issues.apache.org/jira/browse/BEAM-5143
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.7.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5252) Fix the failing complex type test due to misusing reversed keyword of Calcite

2018-08-28 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang closed BEAM-5252.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Fix the failing complex type test due to misusing reversed keyword of Calcite
> -
>
> Key: BEAM-5252
> URL: https://issues.apache.org/jira/browse/BEAM-5252
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Critical
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5252) Fix the failing complex type test due to misusing reversed keyword of Calcite

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5252?focusedWorklogId=138941&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138941
 ]

ASF GitHub Bot logged work on BEAM-5252:


Author: ASF GitHub Bot
Created on: 28/Aug/18 18:19
Start Date: 28/Aug/18 18:19
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on a change in pull request #6290: 
[BEAM-5252][SQL] Improve complext type tests
URL: https://github.com/apache/beam/pull/6290#discussion_r213421443
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
 ##
 @@ -175,15 +178,14 @@ public void testRowWithArray() {
 pipeline.run().waitUntilFinish(Duration.standardMinutes(2));
   }
 
-  @Ignore("https://issues.apache.org/jira/browse/BEAM-5189";)
 
 Review comment:
   The change at line 40 passes this test. 
   
   In the past, `.addStringField("one")` uses Calcite reversed keyword `ONE`, 
which failed the test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138941)
Time Spent: 50m  (was: 40m)

> Fix the failing complex type test due to misusing reversed keyword of Calcite
> -
>
> Key: BEAM-5252
> URL: https://issues.apache.org/jira/browse/BEAM-5252
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4696) Execute Jenkins website tests in a Docker container

2018-08-28 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595396#comment-16595396
 ] 

Udi Meiri commented on BEAM-4696:
-

https://github.com/apache/beam/pull/6282 is out for review, which implements 
pre-commits in docker containers.

> Execute Jenkins website tests in a Docker container
> ---
>
> Key: BEAM-4696
> URL: https://issues.apache.org/jira/browse/BEAM-4696
> Project: Beam
>  Issue Type: Improvement
>  Components: testing, website
>Reporter: Scott Wegner
>Assignee: Udi Meiri
>Priority: Major
>
> Currently, the website tests run in a vanilla Linux environment, which 
> require a prerequisite step to install Ruby. The install script is flaky and 
> adds extra time to the job.
> Instead, we should run the website pre-commits inside the pre-built ruby/2.5 
> docker image so that we don't need to worry about installing extra 
> dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5252) Fix the failing complex type test due to misusing reversed keyword of Calcite

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5252?focusedWorklogId=138940&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138940
 ]

ASF GitHub Bot logged work on BEAM-5252:


Author: ASF GitHub Bot
Created on: 28/Aug/18 18:17
Start Date: 28/Aug/18 18:17
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #6290: 
[BEAM-5252][SQL] Improve complext type tests
URL: https://github.com/apache/beam/pull/6290#discussion_r213420877
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
 ##
 @@ -175,15 +178,14 @@ public void testRowWithArray() {
 pipeline.run().waitUntilFinish(Duration.standardMinutes(2));
   }
 
-  @Ignore("https://issues.apache.org/jira/browse/BEAM-5189";)
 
 Review comment:
   Nevermind, I see from your description you were using reserved keywords.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138940)
Time Spent: 40m  (was: 0.5h)

> Fix the failing complex type test due to misusing reversed keyword of Calcite
> -
>
> Key: BEAM-5252
> URL: https://issues.apache.org/jira/browse/BEAM-5252
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Critical
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5252) Fix the failing complex type test due to misusing reversed keyword of Calcite

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5252?focusedWorklogId=138939&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138939
 ]

ASF GitHub Bot logged work on BEAM-5252:


Author: ASF GitHub Bot
Created on: 28/Aug/18 18:17
Start Date: 28/Aug/18 18:17
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #6290: 
[BEAM-5252][SQL] Improve complext type tests
URL: https://github.com/apache/beam/pull/6290#discussion_r213420617
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
 ##
 @@ -175,15 +178,14 @@ public void testRowWithArray() {
 pipeline.run().waitUntilFinish(Duration.standardMinutes(2));
   }
 
-  @Ignore("https://issues.apache.org/jira/browse/BEAM-5189";)
 
 Review comment:
   I don't see anything that changed in this test. Did this pass before?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138939)
Time Spent: 0.5h  (was: 20m)

> Fix the failing complex type test due to misusing reversed keyword of Calcite
> -
>
> Key: BEAM-5252
> URL: https://issues.apache.org/jira/browse/BEAM-5252
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5252) Fix the failing complex type test due to misusing reversed keyword of Calcite

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5252?focusedWorklogId=138913&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138913
 ]

ASF GitHub Bot logged work on BEAM-5252:


Author: ASF GitHub Bot
Created on: 28/Aug/18 18:09
Start Date: 28/Aug/18 18:09
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #6290: [BEAM-5252][SQL] 
Improve complext type tests
URL: https://github.com/apache/beam/pull/6290#issuecomment-416686302
 
 
   R: @akedin 
   CC: @apilloud 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138913)
Time Spent: 20m  (was: 10m)

> Fix the failing complex type test due to misusing reversed keyword of Calcite
> -
>
> Key: BEAM-5252
> URL: https://issues.apache.org/jira/browse/BEAM-5252
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5252) Fix the failing complex type test due to misusing reversed keyword of Calcite

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5252?focusedWorklogId=138911&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138911
 ]

ASF GitHub Bot logged work on BEAM-5252:


Author: ASF GitHub Bot
Created on: 28/Aug/18 18:08
Start Date: 28/Aug/18 18:08
Worklog Time Spent: 10m 
  Work Description: amaliujia opened a new pull request #6290: 
[BEAM-5252][SQL] Improve complext type tests
URL: https://github.com/apache/beam/pull/6290
 
 
   Due to misusing reversed keyword of Calcite, one of the complex type test 
failed. This PR fixes it.
   
   In addition, this PR adds another unsupported complex test with `@Ignore` to 
test `SELECT row.row`.
   
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138911)
Time Spent: 10m
Remaining Estimate: 0h

> Fix the failing complex type test due to misusing reversed keyword of Calcite
> -
>
> Key: BEAM-5252
> URL: https://issues.apache.org/jira/browse/BEAM-5252
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>   

[jira] [Created] (BEAM-5252) Fix the failing complex type test due to misusing reversed keyword of Calcite

2018-08-28 Thread Rui Wang (JIRA)
Rui Wang created BEAM-5252:
--

 Summary: Fix the failing complex type test due to misusing 
reversed keyword of Calcite
 Key: BEAM-5252
 URL: https://issues.apache.org/jira/browse/BEAM-5252
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=138910&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138910
 ]

ASF GitHub Bot logged work on BEAM-5036:


Author: ASF GitHub Bot
Created on: 28/Aug/18 18:04
Start Date: 28/Aug/18 18:04
Worklog Time Spent: 10m 
  Work Description: timrobertson100 opened a new pull request #6289: 
[BEAM-5036] Optimize the FileBasedSink WriteOperation.moveToOutput()
URL: https://github.com/apache/beam/pull/6289
 
 
   A very simple change which will make the copy of temporary files into the 
target folder faster for `HDFSFilesystem` and `LocalFilesystem` but is not 
intended to affect `GcsFileystem` or `S3Fileystem`. Reasoning for that is on 
the Jira comments.
   
   I have opted against logging a warning if a user tries to rename across FS 
implementations as I am not yet sure what the consequences are if the source is 
an HDFSFilesystem. I was thinking we could add that later.
   
   This can only be applied after https://github.com/apache/beam/pull/6285 is 
merged.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138910)
Time Spent: 10m
Remaining Esti

[jira] [Closed] (BEAM-5121) Investigate flattening issue of nested Row

2018-08-28 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang closed BEAM-5121.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Investigate flattening issue of nested Row
> --
>
> Key: BEAM-5121
> URL: https://issues.apache.org/jira/browse/BEAM-5121
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-28 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595337#comment-16595337
 ] 

Tim Robertson edited comment on BEAM-5036 at 8/28/18 5:58 PM:
--

For info on the other rename() methods:
 * {{S3FileSystem}} implements {{rename()}} as a [copy and 
delete|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java#L597]
 * {{GcsFileSystem}} implements {{rename()}} as a [copy and 
delete|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java#L122]
 * {{LocalFileSystem}} implements {{rename()}} by [making the parent directory 
if 
necessary|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java#L164]
 and then does a file move
 * {{HDFSFileSystem}} following BEAM-4861 (fixed and ready to merge) now 
implements {{rename()}} by creating missing parent directories and doing the 
move

The move across different filesystems is not (fully) supported because the 
{{FileSystems.rename}} gets only the [filesystem for the source 
resource|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L325].
 It is not clear to me what might happen if the source were an 
{{HDFSFilesystem}} which itself can span multiple Filesystems. It is also not 
currently clear to me where we can best do the check - we could simply log a 
warn before the call to rename().


was (Author: timrobertson100):
For info on the other rename() methods:
 * {{S3FileSystem}} implements {{rename()}} as a [copy and 
delete|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java#L597]
 * {{GcsFileSystem}} implements {{rename()}} as a [copy and 
delete|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java#L122]
 * {{LocalFileSystem}} implements {{rename()}} by [making the parent directory 
if 
necessary|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java#L164]
 and then does a file move
 * {{HDFSFileSystem}} following BEAM-4861 (fixed and ready to merge) now 
implements {{rename()}} by creating missing parent directories and doing the 
move

The move across different filesystems is not (fully) supported because the 
{{FileSystems.rename}} gets only the [filesystem for the source 
resource|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L325].
 It is not clear to me what might happen if the source were an 
{{HDFSFilesystem}} which itself can span multiple Filesystems. It is also not 
currently clear to me where we can best do the check.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5251) Throw meaningful exception when have Calcite reversed words in query

2018-08-28 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-5251:
---
Issue Type: Improvement  (was: Bug)

> Throw meaningful exception when have Calcite reversed words in query
> 
>
> Key: BEAM-5251
> URL: https://issues.apache.org/jira/browse/BEAM-5251
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>
> Calcite throws some hard-to-understand error message when wrongly use Calcite 
> reversed keywords. It is better to throw something useful when having this 
> case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5251) Throw meaningful exception when have Calcite reversed words in query

2018-08-28 Thread Rui Wang (JIRA)
Rui Wang created BEAM-5251:
--

 Summary: Throw meaningful exception when have Calcite reversed 
words in query
 Key: BEAM-5251
 URL: https://issues.apache.org/jira/browse/BEAM-5251
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Rui Wang


Calcite throws some hard-to-understand error message when wrongly use Calcite 
reversed keywords. It is better to throw something useful when having this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=138906&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138906
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 17:55
Start Date: 28/Aug/18 17:55
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #6277: 
[BEAM-5240] Add metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#discussion_r213409614
 
 

 ##
 File path: .test-infra/metrics/sync/jenkins/syncjenkins.py
 ##
 @@ -0,0 +1,192 @@
+#
 
 Review comment:
   I wish we had lint checks for code outside of sdks/python.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138906)
Time Spent: 0.5h  (was: 20m)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=138907&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138907
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 17:55
Start Date: 28/Aug/18 17:55
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #6277: 
[BEAM-5240] Add metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#discussion_r213402635
 
 

 ##
 File path: .test-infra/metrics/README.md
 ##
 @@ -0,0 +1,106 @@
+# BeamMonitoring
+This folder contains files required to spin-up metrics dashboard for Beam.
+
+## Utilized technologies
+* [Grafana](https://https://grafana.com) as dashboarding engine.
+* PostgreSQL as underlying DB.
+
+Approach utilized is to fetch data from corresponding system: 
Jenkins/Jira/GithubArchives/etc, put it into PostreSQL and fetch it to show in 
Grafana.
+
+## Local setup
+
+Install docker
+* install docker
+* https://docs.docker.com/install/#supported-platforms
+* install docker-compose
+* https://docs.docker.com/compose/install/#install-compose
+
+```sh
+# Remove old docker
+sudo apt-get remove docker docker-engine docker.io
+
+# Install docker
+sudo apt-get update
+sudo apt-get install \
+ apt-transport-https \
+ ca-certificates \
+ curl \
+ gnupg2 \
+ software-properties-common
+curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
+sudo apt-key fingerprint 0EBFCD88
+sudo add-apt-repository \
+   "deb [arch=amd64] https://download.docker.com/linux/debian \
+   $(lsb_release -cs) \
+   stable"
+sudo apt-get update
+sudo apt-get install docker-ce
+
+# Install docker-compose
+sudo curl -L 
https://github.com/docker/compose/releases/download/1.22.0/docker-compose-$(uname
 -s)-$(uname -m) -o /usr/local/bin/docker-compose
+sudo chmod +x /usr/local/bin/docker-compose
+
+# start docker service if it is not running already
+sudo service docker start
+```
+
+## Kubernetes setup
+
+1. Configure gcloud & kubectl
+  * https://cloud.google.com/kubernetes-engine/docs/quickstart
+2. Configure PosgreSQL
+a. 
https://pantheon.corp.google.com/sql/instances?project=apache-beam-testing
+b. Check on this link to configure connection from kubernetes to 
postgresql: https://cloud.google.com/sql/docs/postgres/connect-kubernetes-engine
+3. add secrets for grafana
+a. `kubectl create secret generic grafana-admin-pwd 
--from-literal=grafana_admin_password=`
+4. create persistent volume claims:
+```sh
+kubectl create -f beam-grafana-etcdata-persistentvolumeclaim.yaml
+kubectl create -f beam-grafana-libdata-persistentvolumeclaim.yaml
+kubectl create -f beam-grafana-logdata-persistentvolumeclaim.yaml
+```
+5. Build and publish sync containers
+```sh
+cd sync/jenkins
+docker build -t gcr.io/${PROJECT_ID}/beammetricssyncjenkins:v1 .
+docker push -t gcr.io/${PROJECT_ID}/beammetricssyncjenkins:v1
+```
+6. Create deployment `kubectl create -f beamgrafana-deploy.yaml`
+
+## Kubernetes update
+https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
+
+1. Build and publish sync containers
+```sh
+cd sync/jenkins
+docker build -t gcr.io/${PROJECT_ID}/beammetricssyncjenkins:v1 .
+docker push -t gcr.io/${PROJECT_ID}/beammetricssyncjenkins:v1
+```
+1. Update image for container `kubectl set image deployment/beamgrafana 
container=`
+
+
+## Useful Kubernetes commands and hints
+```sh
+# Get pods
+kubectl get pods
+
+# Get detailed status
+kubectl describe pod 
+
+# Get logs
+kubectl log  
+
+# Set kubectl logging level: -v [1..10]
+https://github.com/kubernetes/kubernetes/issues/35054
+
+
+```
+## Useful docker commands and hints
+* Connect from one container to another
+* curl :
 
 Review comment:
   This needs some escaping or to be put in a code block.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138907)
Time Spent: 0.5h  (was: 20m)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5189) select Inner row of nested row does not work

2018-08-28 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-5189:
---
Description: We need to support SELECT row.row...row.  (was: Right now, the 
field access to nested row does not work for any access to sub row's field.

 

We need to support SELECT row.row...row.field.)

> select Inner row of nested row does not work
> 
>
> Key: BEAM-5189
> URL: https://issues.apache.org/jira/browse/BEAM-5189
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> We need to support SELECT row.row...row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5189) select Inner row of nested row does not work

2018-08-28 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-5189:
---
Summary: select Inner row of nested row does not work  (was: Field access 
of nested Row)

> select Inner row of nested row does not work
> 
>
> Key: BEAM-5189
> URL: https://issues.apache.org/jira/browse/BEAM-5189
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> Right now, the field access to nested row does not work for any access to sub 
> row's field.
>  
> We need to support SELECT row.row...row.field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5249) org.apache.beam.runners.flink.streaming.UnboundedSourceWrapperTest ParameterizedUnboundedSourceWrapperTest hangs

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5249?focusedWorklogId=138893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138893
 ]

ASF GitHub Bot logged work on BEAM-5249:


Author: ASF GitHub Bot
Created on: 28/Aug/18 17:29
Start Date: 28/Aug/18 17:29
Worklog Time Spent: 10m 
  Work Description: apilloud edited a comment on issue #6288: [BEAM-5249] 
Fix timeouts in beam_Release_Gradle_NightlySnapshot by extending time from 
100min to 150min
URL: https://github.com/apache/beam/pull/6288#issuecomment-416672701
 
 
   Though looking through the bugs I've seen, this probably addresses 
https://issues.apache.org/jira/browse/BEAM-5242
   
   so, LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138893)
Time Spent: 50m  (was: 40m)

> org.apache.beam.runners.flink.streaming.UnboundedSourceWrapperTest 
> ParameterizedUnboundedSourceWrapperTest hangs
> 
>
> Key: BEAM-5249
> URL: https://issues.apache.org/jira/browse/BEAM-5249
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> beam_Release_Gradle_NightlySnapshot sometimes times out at 100 minutes
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/155/]
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/152/]
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/142/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5249) org.apache.beam.runners.flink.streaming.UnboundedSourceWrapperTest ParameterizedUnboundedSourceWrapperTest hangs

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5249?focusedWorklogId=138892&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138892
 ]

ASF GitHub Bot logged work on BEAM-5249:


Author: ASF GitHub Bot
Created on: 28/Aug/18 17:28
Start Date: 28/Aug/18 17:28
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #6288: [BEAM-5249] Fix 
timeouts in beam_Release_Gradle_NightlySnapshot by extending time from 100min 
to 150min
URL: https://github.com/apache/beam/pull/6288#issuecomment-416672701
 
 
   Though looking through the bugs I've seen, this probably addresses 
https://issues.apache.org/jira/browse/BEAM-5242\
   
   so, LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138892)
Time Spent: 40m  (was: 0.5h)

> org.apache.beam.runners.flink.streaming.UnboundedSourceWrapperTest 
> ParameterizedUnboundedSourceWrapperTest hangs
> 
>
> Key: BEAM-5249
> URL: https://issues.apache.org/jira/browse/BEAM-5249
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> beam_Release_Gradle_NightlySnapshot sometimes times out at 100 minutes
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/155/]
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/152/]
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/142/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-28 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595337#comment-16595337
 ] 

Tim Robertson edited comment on BEAM-5036 at 8/28/18 5:26 PM:
--

For info on the other rename() methods:
 * {{S3FileSystem}} implements {{rename()}} as a [copy and 
delete|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java#L597]
 * {{GcsFileSystem}} implements {{rename()}} as a [copy and 
delete|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java#L122]
 * {{LocalFileSystem}} implements {{rename()}} by [making the parent directory 
if 
necessary|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java#L164]
 and then does a file move
 * {{HDFSFileSystem}} following BEAM-4861 (fixed and ready to merge) now 
implements {{rename()}} by creating missing parent directories and doing the 
move

The move across different filesystems is not (fully) supported because the 
{{FileSystems.rename}} gets only the [filesystem for the source 
resource|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L325].
 It is not clear to me what might happen if the source were an 
{{HDFSFilesystem}} which itself can span multiple Filesystems. It is also not 
currently clear to me where we can best do the check.


was (Author: timrobertson100):
For info on the other FileSystem rename():
 * {{S3FileSystem}} implements {{rename()}} as a [copy and 
delete|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java#L597]
 * {{GcsFileSystem}} implements {{rename()}} as a [copy and 
delete|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java#L122]
 * {{LocalFileSystem}} implements {{rename()}} by [making the parent directory 
if 
necessary|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java#L164]
 and then does a file move
 * {{HDFSFileSystem}} following BEAM-4861 (fixed and ready to merge) now 
implements {{rename()}} by creating missing parent directories and doing the 
move

The move across different filesystems is not (fully) supported because the 
{{FileSystems.rename}} gets only the [filesystem for the source 
resource|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L325].
 It is not clear to me what might happen if the source were an 
{{HDFSFilesystem}} which itself can span multiple Filesystems. It is also not 
currently clear to me where we can best do the check.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-28 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595337#comment-16595337
 ] 

Tim Robertson commented on BEAM-5036:
-

For info on the other FileSystem rename():
 * {{S3FileSystem}} implements {{rename()}} as a [copy and 
delete|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java#L597]
 * {{GcsFileSystem}} implements {{rename()}} as a [copy and 
delete|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java#L122]
 * {{LocalFileSystem}} implements {{rename()}} by [making the parent directory 
if 
necessary|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java#L164]
 and then does a file move
 * {{HDFSFileSystem}} following BEAM-4861 (fixed and ready to merge) now 
implements {{rename()}} by creating missing parent directories and doing the 
move

The move across different filesystems is not (fully) supported because the 
{{FileSystems.rename}} gets only the [filesystem for the source 
resource|https://github.com/apache/beam/blob/release-2.6.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L325].
 It is not clear to me what might happen if the source were an 
{{HDFSFilesystem}} which itself can span multiple Filesystems. It is also not 
currently clear to me where we can best do the check.

> Optimize FileBasedSink's WriteOperation.moveToOutput()
> --
>
> Key: BEAM-5036
> URL: https://issues.apache.org/jira/browse/BEAM-5036
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Affects Versions: 2.5.0
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>
> moveToOutput() methods in FileBasedSink.WriteOperation implements move by 
> copy+delete. It would be better to use a rename() which can be much more 
> effective for some filesystems.
> Filesystem must support cross-directory rename. BEAM-4861 is related to this 
> for the case of HDFS filesystem.
> Feature was discussed here:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5249) org.apache.beam.runners.flink.streaming.UnboundedSourceWrapperTest ParameterizedUnboundedSourceWrapperTest hangs

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5249?focusedWorklogId=138891&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138891
 ]

ASF GitHub Bot logged work on BEAM-5249:


Author: ASF GitHub Bot
Created on: 28/Aug/18 17:25
Start Date: 28/Aug/18 17:25
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #6288: [BEAM-5249] Fix 
timeouts in beam_Release_Gradle_NightlySnapshot by extending time from 100min 
to 150min
URL: https://github.com/apache/beam/pull/6288#issuecomment-416671547
 
 
   I don't think this will actually fix the issue. The same failure hit a 
postcommit which timed out at 4 hours: 
https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1363/
   
   Looks like it is a failure in the flink runner, I've added details to the 
bug.
   
   cc: @boyuanzz 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138891)
Time Spent: 0.5h  (was: 20m)

> org.apache.beam.runners.flink.streaming.UnboundedSourceWrapperTest 
> ParameterizedUnboundedSourceWrapperTest hangs
> 
>
> Key: BEAM-5249
> URL: https://issues.apache.org/jira/browse/BEAM-5249
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> beam_Release_Gradle_NightlySnapshot sometimes times out at 100 minutes
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/155/]
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/152/]
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/142/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5249) org.apache.beam.runners.flink.streaming.UnboundedSourceWrapperTest ParameterizedUnboundedSourceWrapperTest hangs

2018-08-28 Thread Andrew Pilloud (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud updated BEAM-5249:
-
Component/s: (was: build-system)
 runner-flink

> org.apache.beam.runners.flink.streaming.UnboundedSourceWrapperTest 
> ParameterizedUnboundedSourceWrapperTest hangs
> 
>
> Key: BEAM-5249
> URL: https://issues.apache.org/jira/browse/BEAM-5249
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> beam_Release_Gradle_NightlySnapshot sometimes times out at 100 minutes
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/155/]
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/152/]
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/142/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5249) beam_Release_Gradle_NightlySnapshot timeouts

2018-08-28 Thread Andrew Pilloud (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595335#comment-16595335
 ] 

Andrew Pilloud commented on BEAM-5249:
--

This just hit on PostCommit, it is a failure in the flink runner.

[https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1363/]
*17:52:45* [Test worker] INFO 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper
 - Unbounded Flink Source 0/4 is reading from sources: 
[org.apache.beam.runners.flink.streaming.TestCountingSource@13904536]*21:11:45* 
Build timed out (after 240 minutes). Marking the build as aborted.

> beam_Release_Gradle_NightlySnapshot timeouts
> 
>
> Key: BEAM-5249
> URL: https://issues.apache.org/jira/browse/BEAM-5249
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> beam_Release_Gradle_NightlySnapshot sometimes times out at 100 minutes
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/155/]
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/152/]
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/142/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5249) org.apache.beam.runners.flink.streaming.UnboundedSourceWrapperTest ParameterizedUnboundedSourceWrapperTest hangs

2018-08-28 Thread Andrew Pilloud (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud updated BEAM-5249:
-
Summary: org.apache.beam.runners.flink.streaming.UnboundedSourceWrapperTest 
ParameterizedUnboundedSourceWrapperTest hangs  (was: 
beam_Release_Gradle_NightlySnapshot timeouts)

> org.apache.beam.runners.flink.streaming.UnboundedSourceWrapperTest 
> ParameterizedUnboundedSourceWrapperTest hangs
> 
>
> Key: BEAM-5249
> URL: https://issues.apache.org/jira/browse/BEAM-5249
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> beam_Release_Gradle_NightlySnapshot sometimes times out at 100 minutes
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/155/]
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/152/]
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/142/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=138890&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138890
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 17:21
Start Date: 28/Aug/18 17:21
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #6277: [BEAM-5240] Add metrics 
dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#issuecomment-416670189
 
 
   run go precommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138890)
Time Spent: 20m  (was: 10m)

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5250) Python Wordcount fails with Flink portable streaming

2018-08-28 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595325#comment-16595325
 ] 

Thomas Weise commented on BEAM-5250:


./gradlew :beam-sdks-python:portableWordCount -PjobEndpoint=localhost:8099 
-Pstreaming

 
{code:java}
[flink-runner-job-server] ERROR 
org.apache.beam.runners.flink.FlinkJobInvocation - Error during job invocation 
BeamApp-tweise-0828171233-920700af_e4ab09ea-cda7-441c-9c1a-e9fa435133bb.

org.apache.flink.runtime.client.JobExecutionException: 
org.apache.beam.sdk.util.UserCodeException: java.lang.IllegalStateException: 
TimestampCombiner moved element from 294247-01-10T04:00:54.775Z 
(TIMESTAMP_MAX_VALUE) to earlier time 294247-01-09T04:00:54.775Z (end of global 
window) for window 
org.apache.beam.sdk.transforms.windowing.GlobalWindow@4e03f446

        at 
org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:625){code}

> Python Wordcount fails with Flink portable streaming
> 
>
> Key: BEAM-5250
> URL: https://issues.apache.org/jira/browse/BEAM-5250
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Priority: Major
>  Labels: portability
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5250) Python Wordcount fails with Flink portable streaming

2018-08-28 Thread Thomas Weise (JIRA)
Thomas Weise created BEAM-5250:
--

 Summary: Python Wordcount fails with Flink portable streaming
 Key: BEAM-5250
 URL: https://issues.apache.org/jira/browse/BEAM-5250
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Thomas Weise






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5249) beam_Release_Gradle_NightlySnapshot timeouts

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5249?focusedWorklogId=138883&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138883
 ]

ASF GitHub Bot logged work on BEAM-5249:


Author: ASF GitHub Bot
Created on: 28/Aug/18 17:11
Start Date: 28/Aug/18 17:11
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on issue #6288: [BEAM-5249] Fix 
timeouts in beam_Release_Gradle_NightlySnapshot by extending time from 100min 
to 150min
URL: https://github.com/apache/beam/pull/6288#issuecomment-41709
 
 
   +R: @apilloud PTAL?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138883)
Time Spent: 20m  (was: 10m)

> beam_Release_Gradle_NightlySnapshot timeouts
> 
>
> Key: BEAM-5249
> URL: https://issues.apache.org/jira/browse/BEAM-5249
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> beam_Release_Gradle_NightlySnapshot sometimes times out at 100 minutes
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/155/]
> [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/152/]
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/142/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5249) beam_Release_Gradle_NightlySnapshot timeouts

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5249?focusedWorklogId=138882&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138882
 ]

ASF GitHub Bot logged work on BEAM-5249:


Author: ASF GitHub Bot
Created on: 28/Aug/18 17:09
Start Date: 28/Aug/18 17:09
Worklog Time Spent: 10m 
  Work Description: alanmyrvold opened a new pull request #6288: 
[BEAM-5249] Fix timeouts by extending time from 100min to 150min
URL: https://github.com/apache/beam/pull/6288
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138882)
Time Spent: 10m
Remaining Estimate: 0h

> beam_Release_Gradle_NightlySnapshot timeouts
> 
>
> Key: BEAM-5249
> URL: https://issues.apache.org/jira/browse/BEAM-5249
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Alan Myrvold
>Assignee: Alan Myrvold
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> beam_Release_Gradle_NightlySnapshot sometimes time

[jira] [Created] (BEAM-5249) beam_Release_Gradle_NightlySnapshot timeouts

2018-08-28 Thread Alan Myrvold (JIRA)
Alan Myrvold created BEAM-5249:
--

 Summary: beam_Release_Gradle_NightlySnapshot timeouts
 Key: BEAM-5249
 URL: https://issues.apache.org/jira/browse/BEAM-5249
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Alan Myrvold
Assignee: Alan Myrvold


beam_Release_Gradle_NightlySnapshot sometimes times out at 100 minutes

[https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/155/]

[https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/152/]

https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/142/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=138851&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138851
 ]

ASF GitHub Bot logged work on BEAM-5240:


Author: ASF GitHub Bot
Created on: 28/Aug/18 16:18
Start Date: 28/Aug/18 16:18
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6277: [BEAM-5240] Add 
metrics dashboard deployment script and logic
URL: https://github.com/apache/beam/pull/6277#issuecomment-416648781
 
 
   This is very cool. I'm doing a quick review today.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138851)
Time Spent: 10m
Remaining Estimate: 0h

> Create post-commit tests dashboard
> --
>
> Key: BEAM-5240
> URL: https://issues.apache.org/jira/browse/BEAM-5240
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=138850&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138850
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 28/Aug/18 16:10
Start Date: 28/Aug/18 16:10
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287#issuecomment-416646255
 
 
   CC @tweise @angoenka 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138850)
Time Spent: 20m  (was: 10m)

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without Docker. Analog to the 
> {{DockerJobBundleFactory}}, a {{ProcessJobBundleFactory}} could be added to 
> directly fork SDK harness processes.
> Artifacts will be provided by an artifact directory or could be setup similar 
> to the existing bootstrapping code ("boot.go") which we use for containers.
> The process-based execution can optionally be configured via the pipeline 
> options.
> [1] 
> [https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5187) Create a ProcessJobBundleFactory for non-dockerized SDK harness

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5187?focusedWorklogId=138848&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138848
 ]

ASF GitHub Bot logged work on BEAM-5187:


Author: ASF GitHub Bot
Created on: 28/Aug/18 16:09
Start Date: 28/Aug/18 16:09
Worklog Time Spent: 10m 
  Work Description: mxm opened a new pull request #6287: [BEAM-5187] Add a 
ProcessJobBundleFactory for process-based execution
URL: https://github.com/apache/beam/pull/6287
 
 
   This adds a ProcessJobBundleFactory for process-based execution which 
currently
   takes the Environment URL String and executes it with the regular parameters
   which are part of the container contract.
   
   As of now, the factory needs to be wired manually. This will change when the
   Runner API's Environment supports process-based execution.
   
   The next steps are:
   
   - Modifying the Environment Proto to explicitly support process-based 
execution
 alongside with other parameters (e.g. target platform, OS).
   
   - Introduce artifact staging
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138848)
Time Spent: 10m
Remaining Estimate: 0h

> Create a ProcessJobBundleFactory for non-dockerized SDK harness
> ---
>
> Key: BEAM-5187
> URL: https://issues.apache.org/jira/browse/BEAM-5187
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed on the mailing list [1], we want to giver users an option to 
> execute portable pipelines without D

[jira] [Work logged] (BEAM-3310) Push metrics to a backend in an runner agnostic way

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3310?focusedWorklogId=138843&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138843
 ]

ASF GitHub Bot logged work on BEAM-3310:


Author: ASF GitHub Bot
Created on: 28/Aug/18 15:38
Start Date: 28/Aug/18 15:38
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #4548: [BEAM-3310] Metrics 
pusher
URL: https://github.com/apache/beam/pull/4548#issuecomment-416634360
 
 
   @JozoVilcek yes, thanks for the reminder. I just commented on the ticket


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138843)
Time Spent: 18h 50m  (was: 18h 40m)

> Push metrics to a backend in an runner agnostic way
> ---
>
> Key: BEAM-3310
> URL: https://issues.apache.org/jira/browse/BEAM-3310
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-extensions-metrics, sdk-java-core
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> The idea is to avoid relying on the runners to provide access to the metrics 
> (either at the end of the pipeline or while it runs) because they don't have 
> all the same capabilities towards metrics (e.g. spark runner configures sinks 
>  like csv, graphite or in memory sinks using the spark engine conf). The 
> target is to push the metrics in the common runner code so that no matter the 
> chosen runner, a user can get his metrics out of beam.
> Here is the link to the discussion thread on the dev ML: 
> https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E
> And the design doc:
> https://s.apache.org/runner_independent_metrics_extraction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5246) Beam metrics exported as flink metrics are not correct

2018-08-28 Thread Etienne Chauchot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595163#comment-16595163
 ] 

Etienne Chauchot commented on BEAM-5246:


Maybe [~lzljs3620320] can take a look as he developed metrics support in flink 
runner.

> Beam metrics exported as flink metrics are not correct
> --
>
> Key: BEAM-5246
> URL: https://issues.apache.org/jira/browse/BEAM-5246
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Major
>
> In Flink UI and fink native MetricReported, I am seeing too many instances of 
> my Beam metric counter. It looks like the counter is materialised for every 
> operator running within the task, although is is emitter from only one beam 
> step (which should map to one operator?). This produces double counting.
> A bit debugging I noticed this is happening for stream jobs. In batch I was 
> not able to reproduce it. Problem might be in FlinkMetricContainer.
> [https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L86]
> The update seems to be called from operators after finishing the bundle. Data 
> from accumulator are flushed to `runtimeContext.getMetricGroup()`. The scope 
> of accumulator seems to be different than metricGroup as in there with 
> different call the scope components change, especially for operatorID. It 
> seems like during the run, `metricResult.getStep()` does not match 
> operatorName of metricGroup where metric is being pushed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5248) Euphoria to Beam translators should set coders to output `PCollections' rather than to input ones

2018-08-28 Thread Vaclav Plajt (JIRA)
Vaclav Plajt created BEAM-5248:
--

 Summary: Euphoria to Beam translators should set coders to output 
`PCollections' rather than to input ones
 Key: BEAM-5248
 URL: https://issues.apache.org/jira/browse/BEAM-5248
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-euphoria
Reporter: Vaclav Plajt
Assignee: Vaclav Plajt


Euphoria to Beam translators sets coders to new `PCollections` automatically. 
Every new `PCollection` gets coder, event input `PCollection`'s coders are set. 
That is a trouble when any `Dataset`/`PCollection` is used as input to two or 
more operators. Set coders for output `Pcollections` but not for output ones.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5148) Implement MongoDB IO for Python SDK

2018-08-28 Thread Pascal Gula (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595059#comment-16595059
 ] 

Pascal Gula commented on BEAM-5148:
---

This might be useful? https://github.com/uqfoundation/dill/issues/207

> Implement MongoDB IO for Python SDK
> ---
>
> Key: BEAM-5148
> URL: https://issues.apache.org/jira/browse/BEAM-5148
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 3.0.0
>Reporter: Pascal Gula
>Assignee: Pascal Gula
>Priority: Major
> Fix For: Not applicable
>
>
> Currently Java SDK has MongoDB support but Python SDK does not. With current 
> portability efforts other runners may soon be able to use Python SDK. Having 
> mongoDB support will allow these runners to execute large scale jobs using it.
> Since we need this IO components @ Peat, we started working on a PyPi package 
> available at this repository: [https://github.com/PEAT-AI/beam-extended]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5148) Implement MongoDB IO for Python SDK

2018-08-28 Thread Pascal Gula (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595050#comment-16595050
 ] 

Pascal Gula commented on BEAM-5148:
---

[~chamikara], I am facing an issue with the Sink part of the connector, and 
described it in this SO post: 
[https://stackoverflow.com/questions/52040923/error-trying-to-implement-a-mongodb-io-connector-sink]

Any help would be very useful!

 

> Implement MongoDB IO for Python SDK
> ---
>
> Key: BEAM-5148
> URL: https://issues.apache.org/jira/browse/BEAM-5148
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 3.0.0
>Reporter: Pascal Gula
>Assignee: Pascal Gula
>Priority: Major
> Fix For: Not applicable
>
>
> Currently Java SDK has MongoDB support but Python SDK does not. With current 
> portability efforts other runners may soon be able to use Python SDK. Having 
> mongoDB support will allow these runners to execute large scale jobs using it.
> Since we need this IO components @ Peat, we started working on a PyPi package 
> available at this repository: [https://github.com/PEAT-AI/beam-extended]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5062) Add ability to configure S3ClientOptions

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5062?focusedWorklogId=138805&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138805
 ]

ASF GitHub Bot logged work on BEAM-5062:


Author: ASF GitHub Bot
Created on: 28/Aug/18 13:29
Start Date: 28/Aug/18 13:29
Worklog Time Spent: 10m 
  Work Description: yuppie-flu commented on issue #6122: [BEAM-5062] Add 
ability to provide custom S3ClientOptions
URL: https://github.com/apache/beam/pull/6122#issuecomment-416584155
 
 
   @iemejia I don't have a strong opinion on that point. However I agree, that 
at least in current version of aws java library their API for such settings is 
builder-oriented. I just don't like that the builder is so huge and contains a 
lot of other stuff. Nevertheless it's irrelevant to apache-beam.
   I reworked the PR to expose the whole builder. Could you please take a look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138805)
Time Spent: 2h 20m  (was: 2h 10m)

> Add ability to configure S3ClientOptions
> 
>
> Key: BEAM-5062
> URL: https://issues.apache.org/jira/browse/BEAM-5062
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> It would be very useful to have an ability to configure 
> [S3ClientOptions|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/S3ClientOptions.html]
>  for Apache Beam jobs.
> For example, there are some implementations of S3, that does not support 
> virtual-hosted-style URLs for buckets, only path-style. Currently it's 
> impossible to enable path style access for amazon s3 client, which is used by 
> an apache-beam job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5172) org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5172?focusedWorklogId=138806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138806
 ]

ASF GitHub Bot logged work on BEAM-5172:


Author: ASF GitHub Bot
Created on: 28/Aug/18 13:29
Start Date: 28/Aug/18 13:29
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #6279:  [BEAM-5172] Fix 
Elasticsearch UTests flakiness
URL: https://github.com/apache/beam/pull/6279#issuecomment-416584295
 
 
   PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138806)
Time Spent: 1h 10m  (was: 1h)

> org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky
> -
>
> Key: BEAM-5172
> URL: https://issues.apache.org/jira/browse/BEAM-5172
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In a recent PostCommit builld, 
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1290/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testRead/
>  failed with:
> Error Message
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
> Stacktrace
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:168)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:413)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:404)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.testRead(ElasticsearchIOTestCommon.java:124)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testRead(ElasticsearchIOTest.java:125)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.Ab

[jira] [Work logged] (BEAM-5172) org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5172?focusedWorklogId=138804&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138804
 ]

ASF GitHub Bot logged work on BEAM-5172:


Author: ASF GitHub Bot
Created on: 28/Aug/18 13:29
Start Date: 28/Aug/18 13:29
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6279:  
[BEAM-5172] Fix Elasticsearch UTests flakiness
URL: https://github.com/apache/beam/pull/6279#discussion_r213304721
 
 

 ##
 File path: 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java
 ##
 @@ -101,6 +101,14 @@ public static void beforeClass() throws IOException {
 restClient = connectionConfiguration.createClient();
 elasticsearchIOTestCommon =
 new ElasticsearchIOTestCommon(connectionConfiguration, restClient, 
false);
+while (restClient.performRequest("HEAD", 
"/").getStatusLine().getStatusCode() != 200) {
 
 Review comment:
   I added max startup waiting time in the second commit 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138804)
Time Spent: 1h  (was: 50m)

> org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky
> -
>
> Key: BEAM-5172
> URL: https://issues.apache.org/jira/browse/BEAM-5172
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In a recent PostCommit builld, 
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1290/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testRead/
>  failed with:
> Error Message
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
> Stacktrace
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:168)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:413)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:404)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.testRead(ElasticsearchIOTestCommon.java:124)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testRead(ElasticsearchIOTest.java:125)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.e

[jira] [Work logged] (BEAM-5172) org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5172?focusedWorklogId=138797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138797
 ]

ASF GitHub Bot logged work on BEAM-5172:


Author: ASF GitHub Bot
Created on: 28/Aug/18 13:06
Start Date: 28/Aug/18 13:06
Worklog Time Spent: 10m 
  Work Description: echauchot commented on a change in pull request #6279:  
[BEAM-5172] Fix Elasticsearch UTests flakiness
URL: https://github.com/apache/beam/pull/6279#discussion_r213304721
 
 

 ##
 File path: 
sdks/java/io/elasticsearch-tests/elasticsearch-tests-2/src/test/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIOTest.java
 ##
 @@ -101,6 +101,14 @@ public static void beforeClass() throws IOException {
 restClient = connectionConfiguration.createClient();
 elasticsearchIOTestCommon =
 new ElasticsearchIOTestCommon(connectionConfiguration, restClient, 
false);
+while (restClient.performRequest("HEAD", 
"/").getStatusLine().getStatusCode() != 200) {
 
 Review comment:
   I add max startup waiting time in the second commit 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138797)
Time Spent: 50m  (was: 40m)

> org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky
> -
>
> Key: BEAM-5172
> URL: https://issues.apache.org/jira/browse/BEAM-5172
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In a recent PostCommit builld, 
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1290/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testRead/
>  failed with:
> Error Message
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
> Stacktrace
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:168)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:413)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:404)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.testRead(ElasticsearchIOTestCommon.java:124)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testRead(ElasticsearchIOTest.java:125)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.e

[jira] [Commented] (BEAM-4952) Beam Dependency Update Request: org.apache.hbase:hbase-hadoop-compat 2.1.0

2018-08-28 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594936#comment-16594936
 ] 

Tim Robertson commented on BEAM-4952:
-

Thanks [~chamikara]

Specifically on this one - I'm reluctant to bump HBase while the discussion on 
dev@ is underway. I'm concerned that Beam could alienate Hadoop users if we 
don't support the versions in use by  Amazon EMR, Cloudera and Hortonworks etc. 

Do you use HBase 2.1 yourself please?

> Beam Dependency Update Request: org.apache.hbase:hbase-hadoop-compat 2.1.0
> --
>
> Key: BEAM-4952
> URL: https://issues.apache.org/jira/browse/BEAM-4952
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tim Robertson
>Priority: Major
>
> 2018-07-25 20:28:24.987897
> Please review and upgrade the org.apache.hbase:hbase-hadoop-compat to 
> the latest version 2.1.0 
>  
> cc: 
> 2018-08-06 12:11:58.406173
> Please review and upgrade the org.apache.hbase:hbase-hadoop-compat to 
> the latest version 2.1.0 
>  
> cc: 
> 2018-08-13 12:13:31.045787
> Please review and upgrade the org.apache.hbase:hbase-hadoop-compat to 
> the latest version 2.1.0 
>  
> cc: 
> 2018-08-20 12:14:04.735400
> Please review and upgrade the org.apache.hbase:hbase-hadoop-compat to 
> the latest version 2.1.0 
>  
> cc: 
> 2018-08-27 12:15:07.483727
> Please review and upgrade the org.apache.hbase:hbase-hadoop-compat to 
> the latest version 2.1.0 
>  
> cc: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4861) Hadoop Filesystem silently fails

2018-08-28 Thread Tim Robertson (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Robertson reassigned BEAM-4861:
---

Assignee: Tim Robertson  (was: Chamikara Jayalath)

> Hadoop Filesystem silently fails
> 
>
> Key: BEAM-4861
> URL: https://issues.apache.org/jira/browse/BEAM-4861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Jozef Vilcek
>Assignee: Tim Robertson
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi,
> beam Filesystem operations copy, rename and delete are void in SDK. Hadoop 
> native filesystem operations are not and returns void. Current implementation 
> in Beam ignores the result and pass as long as exception is not thrown.
> I got burned by this when using 'rename' to do a 'move' operation on HDFS. If 
> target directory does not exists, operations returns false and do not touch 
> the file.
> [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L148]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4861) Hadoop Filesystem silently fails

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4861?focusedWorklogId=138783&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138783
 ]

ASF GitHub Bot logged work on BEAM-4861:


Author: ASF GitHub Bot
Created on: 28/Aug/18 12:40
Start Date: 28/Aug/18 12:40
Worklog Time Spent: 10m 
  Work Description: timrobertson100 opened a new pull request #6285: 
[BEAM-4861] Autocreate directories when doing an HDFS rename
URL: https://github.com/apache/beam/pull/6285
 
 
   Improves HDFS filesystem operations by
   
   1. Auto-creating missing parent directories for the HDFS rename()
   2. Throw error if a copy operation indicates it was not successful 
(precautionary)  
   
   This paves the way for BEAM-5036 which will improve the performance on HDFS 
by making use of rename instead of copy&delete for moving the temporary files 
into place.
   
   PTAL @JozoVilcek & @reuvenlax 
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138783)
Time Spent: 10m
Remaining Estimate: 0h

> Hadoop Filesystem silently fails
> 
>
> Key: BEAM-4861
> URL: https://issues.apa

[jira] [Commented] (BEAM-4861) Hadoop Filesystem silently fails

2018-08-28 Thread Jozef Vilcek (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594868#comment-16594868
 ] 

Jozef Vilcek commented on BEAM-4861:


Yes, make sense to me

> Hadoop Filesystem silently fails
> 
>
> Key: BEAM-4861
> URL: https://issues.apache.org/jira/browse/BEAM-4861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Jozef Vilcek
>Assignee: Chamikara Jayalath
>Priority: Major
>
> Hi,
> beam Filesystem operations copy, rename and delete are void in SDK. Hadoop 
> native filesystem operations are not and returns void. Current implementation 
> in Beam ignores the result and pass as long as exception is not thrown.
> I got burned by this when using 'rename' to do a 'move' operation on HDFS. If 
> target directory does not exists, operations returns false and do not touch 
> the file.
> [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L148]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-4861) Hadoop Filesystem silently fails

2018-08-28 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594794#comment-16594794
 ] 

Tim Robertson edited comment on BEAM-4861 at 8/28/18 10:33 AM:
---

On further inspection I think {{delete}} is correct to swallow a {{false}} 
response [~JozoVilcek] 
 * A {{delete}} for example will return {{false}} when you try and delete a non 
existing file which seems reasonable to swallow. It will throw exception for 
the scenarios that mater.

The {{copy}} seems indifferent, so we might as well throw exception to be 
cautious:
 * The {{copy}} returns false only if there is issue with {{mkdirs}} and the 
HDFS docs [1] state that it always returns true.

For {{rename()}} we can create the directory if not existing and then should 
throw exception on any response that is false. 

 

[1] 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_renamePath_src_Path_d]
 

 


was (Author: timrobertson100):
On further inspection I think {{delete}} and {{copy}} are correct to swallow a 
{{false}} response [~JozoVilcek] 
 * A {{delete}} for example will return {{false}} when you try and delete a non 
existing file which seems reasonable to swallow. It will throw exception for 
the scenarios that mater.
 * The {{copy}} returns false only if there is issue with {{mkdirs}} and the 
HDFS docs [1] state that it always returns true even if the directory is not 
created [1] I think we can ignore the local filesystem implementation.

For {{rename()}} we can create the directory if not existing and then should 
throw exception on any response that is false. 

 

[1] 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_renamePath_src_Path_d]
 

 

> Hadoop Filesystem silently fails
> 
>
> Key: BEAM-4861
> URL: https://issues.apache.org/jira/browse/BEAM-4861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Jozef Vilcek
>Assignee: Chamikara Jayalath
>Priority: Major
>
> Hi,
> beam Filesystem operations copy, rename and delete are void in SDK. Hadoop 
> native filesystem operations are not and returns void. Current implementation 
> in Beam ignores the result and pass as long as exception is not thrown.
> I got burned by this when using 'rename' to do a 'move' operation on HDFS. If 
> target directory does not exists, operations returns false and do not touch 
> the file.
> [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L148]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4861) Hadoop Filesystem silently fails

2018-08-28 Thread Tim Robertson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594794#comment-16594794
 ] 

Tim Robertson commented on BEAM-4861:
-

On further inspection I think {{delete}} and {{copy}} are correct to swallow a 
{{false}} response [~JozoVilcek] 
 * A {{delete}} for example will return {{false}} when you try and delete a non 
existing file which seems reasonable to swallow. It will throw exception for 
the scenarios that mater.
 * The {{copy}} returns false only if there is issue with {{mkdirs}} and the 
HDFS docs [1] state that it always returns true even if the directory is not 
created [1] I think we can ignore the local filesystem implementation.

For {{rename()}} we can create the directory if not existing and then should 
throw exception on any response that is false. 

 

[1] 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_renamePath_src_Path_d]
 

 

> Hadoop Filesystem silently fails
> 
>
> Key: BEAM-4861
> URL: https://issues.apache.org/jira/browse/BEAM-4861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Jozef Vilcek
>Assignee: Chamikara Jayalath
>Priority: Major
>
> Hi,
> beam Filesystem operations copy, rename and delete are void in SDK. Hadoop 
> native filesystem operations are not and returns void. Current implementation 
> in Beam ignores the result and pass as long as exception is not thrown.
> I got burned by this when using 'rename' to do a 'move' operation on HDFS. If 
> target directory does not exists, operations returns false and do not touch 
> the file.
> [https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystem.java#L148]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5247) Remove slf4j-simple binding from dependencies

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5247?focusedWorklogId=138737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138737
 ]

ASF GitHub Bot logged work on BEAM-5247:


Author: ASF GitHub Bot
Created on: 28/Aug/18 10:19
Start Date: 28/Aug/18 10:19
Worklog Time Spent: 10m 
  Work Description: JozoVilcek opened a new pull request #6284: [BEAM-5247] 
Remove slf4j-simple binding from dependencies
URL: https://github.com/apache/beam/pull/6284
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138737)
Time Spent: 10m
Remaining Estimate: 0h

> Remove slf4j-simple binding from dependencies
> -
>
> Key: BEAM-5247
> URL: https://issues.apache.org/jira/browse/BEAM-5247
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jozef Vilcek
>Assignee: Aljoscha Krettek
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Flink runner declares a slf4j-simple binding in dependencies. This can break 
> logging of application if they have their own binding and does not exclude 
> this one from beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5247) Remove slf4j-simple binding from dependencies

2018-08-28 Thread Jozef Vilcek (JIRA)
Jozef Vilcek created BEAM-5247:
--

 Summary: Remove slf4j-simple binding from dependencies
 Key: BEAM-5247
 URL: https://issues.apache.org/jira/browse/BEAM-5247
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Jozef Vilcek
Assignee: Aljoscha Krettek


Flink runner declares a slf4j-simple binding in dependencies. This can break 
logging of application if they have their own binding and does not exclude this 
one from beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3820) SolrIO: Allow changing batchSize for writes

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3820?focusedWorklogId=138735&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138735
 ]

ASF GitHub Bot logged work on BEAM-3820:


Author: ASF GitHub Bot
Created on: 28/Aug/18 10:16
Start Date: 28/Aug/18 10:16
Worklog Time Spent: 10m 
  Work Description: aalbatross commented on issue #6283: [BEAM-3820] 
Exposing batchSize for SolrIO Writes
URL: https://github.com/apache/beam/pull/6283#issuecomment-416529948
 
 
   @iemejia Please take a look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138735)
Time Spent: 0.5h  (was: 20m)

> SolrIO: Allow changing batchSize for writes
> ---
>
> Key: BEAM-3820
> URL: https://issues.apache.org/jira/browse/BEAM-3820
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Tim Robertson
>Assignee: Ravi Pathak
>Priority: Trivial
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The SolrIO hard codes the batchSize for writes at 1000.  It would be a good 
> addition to allow the user to set the batchSize explicitly (similar to the 
> ElasticsearchIO)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3820) SolrIO: Allow changing batchSize for writes

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3820?focusedWorklogId=138727&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138727
 ]

ASF GitHub Bot logged work on BEAM-3820:


Author: ASF GitHub Bot
Created on: 28/Aug/18 09:44
Start Date: 28/Aug/18 09:44
Worklog Time Spent: 10m 
  Work Description: aalbatross commented on issue #6283: [BEAM-3820] 
Exposing batchSize for SolrIO Writes
URL: https://github.com/apache/beam/pull/6283#issuecomment-416519756
 
 
   @timrobertson100 please review this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138727)
Time Spent: 20m  (was: 10m)

> SolrIO: Allow changing batchSize for writes
> ---
>
> Key: BEAM-3820
> URL: https://issues.apache.org/jira/browse/BEAM-3820
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Tim Robertson
>Assignee: Ravi Pathak
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The SolrIO hard codes the batchSize for writes at 1000.  It would be a good 
> addition to allow the user to set the batchSize explicitly (similar to the 
> ElasticsearchIO)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3820) SolrIO: Allow changing batchSize for writes

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3820?focusedWorklogId=138726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138726
 ]

ASF GitHub Bot logged work on BEAM-3820:


Author: ASF GitHub Bot
Created on: 28/Aug/18 09:43
Start Date: 28/Aug/18 09:43
Worklog Time Spent: 10m 
  Work Description: aalbatross opened a new pull request #6283: [BEAM-3820] 
Exposing batchSize for SolrIO Writes
URL: https://github.com/apache/beam/pull/6283
 
 
   exposing batch size for SolrIO writes by making withMaxBatchSize() method 
public.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | --- | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138726)
Time Spent: 10m
Remaining Estimate: 0h

> SolrIO: Allow changing batchSize for writes
> ---
>
> Key: BEAM-3820
> URL: https://issues.apache.org/jira/browse/BEAM-3820
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Tim Robertson
>Assignee: Ravi Pathak
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> T

[jira] [Work logged] (BEAM-5172) org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5172?focusedWorklogId=138721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138721
 ]

ASF GitHub Bot logged work on BEAM-5172:


Author: ASF GitHub Bot
Created on: 28/Aug/18 09:25
Start Date: 28/Aug/18 09:25
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #6279:  [BEAM-5172] Fix 
Elasticsearch UTests flakiness
URL: https://github.com/apache/beam/pull/6279#issuecomment-416514386
 
 
   @Ardagan @mxm ES_INDEX was static so adding the thread id was inefficient 
for // threads. I replaced it by a static method. That way, even if test 
methods can be run in // in the same JVM, ES index name will be constructed for 
each method with the current thread id.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138721)
Time Spent: 40m  (was: 0.5h)

> org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky
> -
>
> Key: BEAM-5172
> URL: https://issues.apache.org/jira/browse/BEAM-5172
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In a recent PostCommit builld, 
> https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1290/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testRead/
>  failed with:
> Error Message
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
> Stacktrace
> java.lang.AssertionError: Count/Flatten.PCollections.out: 
> Expected: <400L>
>  but: was <470L>
>   at 
> org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:168)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:413)
>   at org.apache.beam.sdk.testing.PAssert.thatSingleton(PAssert.java:404)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTestCommon.testRead(ElasticsearchIOTestCommon.java:124)
>   at 
> org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testRead(ElasticsearchIOTest.java:125)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
>   at 
> org.gradle.api.internal.tas

[jira] [Commented] (BEAM-4418) Improve gradle integration with IntelliJ

2018-08-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594744#comment-16594744
 ] 

Ismaël Mejía commented on BEAM-4418:


BEAM-4045 contains the ones mentioned by Etienne and a complete list of 
improvements after the move to gradle.

> Improve gradle integration with IntelliJ
> 
>
> Key: BEAM-4418
> URL: https://issues.apache.org/jira/browse/BEAM-4418
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Etienne Chauchot
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To be able to work efficiently with gradle, the integration with intelliJ 
> (more common IDE in the community I think) needs to be improved.The aim of 
> this ticket is to gather areas of improvement discovered by people. Feel free 
> to comment on what you discovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3310) Push metrics to a backend in an runner agnostic way

2018-08-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3310?focusedWorklogId=138717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138717
 ]

ASF GitHub Bot logged work on BEAM-3310:


Author: ASF GitHub Bot
Created on: 28/Aug/18 09:15
Start Date: 28/Aug/18 09:15
Worklog Time Spent: 10m 
  Work Description: JozoVilcek commented on issue #4548: [BEAM-3310] 
Metrics pusher
URL: https://github.com/apache/beam/pull/4548#issuecomment-416511465
 
 
   FYI, https://issues.apache.org/jira/browse/BEAM-5246
   
   There is still open question of MetricPusher way for detached jobs ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138717)
Time Spent: 18h 40m  (was: 18.5h)

> Push metrics to a backend in an runner agnostic way
> ---
>
> Key: BEAM-3310
> URL: https://issues.apache.org/jira/browse/BEAM-3310
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-extensions-metrics, sdk-java-core
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 18h 40m
>  Remaining Estimate: 0h
>
> The idea is to avoid relying on the runners to provide access to the metrics 
> (either at the end of the pipeline or while it runs) because they don't have 
> all the same capabilities towards metrics (e.g. spark runner configures sinks 
>  like csv, graphite or in memory sinks using the spark engine conf). The 
> target is to push the metrics in the common runner code so that no matter the 
> chosen runner, a user can get his metrics out of beam.
> Here is the link to the discussion thread on the dev ML: 
> https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E
> And the design doc:
> https://s.apache.org/runner_independent_metrics_extraction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5246) Beam metrics exported as flink metrics are not correct

2018-08-28 Thread Jozef Vilcek (JIRA)
Jozef Vilcek created BEAM-5246:
--

 Summary: Beam metrics exported as flink metrics are not correct
 Key: BEAM-5246
 URL: https://issues.apache.org/jira/browse/BEAM-5246
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Affects Versions: 2.6.0
Reporter: Jozef Vilcek
Assignee: Aljoscha Krettek


In Flink UI and fink native MetricReported, I am seeing too many instances of 
my Beam metric counter. It looks like the counter is materialised for every 
operator running within the task, although is is emitter from only one beam 
step (which should map to one operator?). This produces double counting.

A bit debugging I noticed this is happening for stream jobs. In batch I was not 
able to reproduce it. Problem might be in FlinkMetricContainer.

[https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L86]

The update seems to be called from operators after finishing the bundle. Data 
from accumulator are flushed to `runtimeContext.getMetricGroup()`. The scope of 
accumulator seems to be different than metricGroup as in there with different 
call the scope components change, especially for operatorID. It seems like 
during the run, `metricResult.getStep()` does not match operatorName of 
metricGroup where metric is being pushed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >