[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=418010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418010
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 07/Apr/20 21:57
Start Date: 07/Apr/20 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 418010)
Time Spent: 8h 10m  (was: 8h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=418008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418008
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 07/Apr/20 21:56
Start Date: 07/Apr/20 21:56
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r405136162
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
 ##
 @@ -207,31 +210,94 @@ public static Environment createProcessEnvironment(
 }
   }
 
-  public static Collection getArtifacts(PipelineOptions 
options) {
-Set pathsToStage = Sets.newHashSet();
-List stagingFiles = 
options.as(PortablePipelineOptions.class).getFilesToStage();
-if (stagingFiles != null) {
-  pathsToStage.addAll(stagingFiles);
-}
-
-ImmutableList.Builder filesToStage = 
ImmutableList.builder();
+  private static List getArtifacts(List 
stagingFiles) {
+Set pathsToStage = Sets.newHashSet(stagingFiles);
+ImmutableList.Builder artifactsBuilder = 
ImmutableList.builder();
 for (String path : pathsToStage) {
   File file = new File(path);
-  if (new File(path).exists()) {
-// Spurious items get added to the classpath. Filter by just those 
that exist.
+  // Spurious items get added to the classpath. Filter by just those that 
exist.
+  if (file.exists()) {
+ArtifactInformation.Builder artifactBuilder = 
ArtifactInformation.newBuilder();
+
artifactBuilder.setTypeUrn(BeamUrns.getUrn(StandardArtifacts.Types.FILE));
+
artifactBuilder.setRoleUrn(BeamUrns.getUrn(StandardArtifacts.Roles.STAGING_TO));
+artifactBuilder.setRolePayload(
+RunnerApi.ArtifactStagingToRolePayload.newBuilder()
+.setStagedName(createStagingFileName(file))
+.build()
+.toByteString());
 if (file.isDirectory()) {
-  // Zip up directories so we can upload them to the artifact service.
+  File zippedFile;
 
 Review comment:
   Nit, there seems to be a fair amount of duplication between these two case. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 418008)
Time Spent: 7h 50m  (was: 7h 40m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=418009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418009
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 07/Apr/20 21:56
Start Date: 07/Apr/20 21:56
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r405136418
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
 ##
 @@ -207,31 +210,94 @@ public static Environment createProcessEnvironment(
 }
   }
 
-  public static Collection getArtifacts(PipelineOptions 
options) {
-Set pathsToStage = Sets.newHashSet();
-List stagingFiles = 
options.as(PortablePipelineOptions.class).getFilesToStage();
-if (stagingFiles != null) {
-  pathsToStage.addAll(stagingFiles);
-}
-
-ImmutableList.Builder filesToStage = 
ImmutableList.builder();
+  private static List getArtifacts(List 
stagingFiles) {
+Set pathsToStage = Sets.newHashSet(stagingFiles);
+ImmutableList.Builder artifactsBuilder = 
ImmutableList.builder();
 for (String path : pathsToStage) {
   File file = new File(path);
-  if (new File(path).exists()) {
-// Spurious items get added to the classpath. Filter by just those 
that exist.
+  // Spurious items get added to the classpath. Filter by just those that 
exist.
+  if (file.exists()) {
+ArtifactInformation.Builder artifactBuilder = 
ArtifactInformation.newBuilder();
+
artifactBuilder.setTypeUrn(BeamUrns.getUrn(StandardArtifacts.Types.FILE));
+
artifactBuilder.setRoleUrn(BeamUrns.getUrn(StandardArtifacts.Roles.STAGING_TO));
+artifactBuilder.setRolePayload(
+RunnerApi.ArtifactStagingToRolePayload.newBuilder()
+.setStagedName(createStagingFileName(file))
+.build()
+.toByteString());
 if (file.isDirectory()) {
-  // Zip up directories so we can upload them to the artifact service.
+  File zippedFile;
+  HashCode hashCode;
   try {
-filesToStage.add(createArtifactInformation(zipDirectory(file)));
+zippedFile = zipDirectory(file);
+hashCode = Files.asByteSource(zippedFile).hash(Hashing.sha256());
   } catch (IOException e) {
 throw new RuntimeException(e);
   }
+  artifactsBuilder.add(
+  artifactBuilder
+  .setTypePayload(
+  RunnerApi.ArtifactFilePayload.newBuilder()
+  .setPath(zippedFile.getPath())
+  .setSha256(hashCode.toString())
+  .build()
+  .toByteString())
+  .build());
 } else {
-  filesToStage.add(createArtifactInformation(file));
+  HashCode hashCode;
+  try {
+hashCode = Files.asByteSource(file).hash(Hashing.sha256());
+  } catch (IOException e) {
+throw new RuntimeException(e);
 
 Review comment:
   Or would it be better to let the method throw an IOException? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 418009)
Time Spent: 8h  (was: 7h 50m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417967
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 07/Apr/20 20:45
Start Date: 07/Apr/20 20:45
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating 
artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-610611127
 
 
   @robertwb All comments are addressed. PTAL!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 417967)
Time Spent: 7h 40m  (was: 7.5h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417859&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417859
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 07/Apr/20 18:05
Start Date: 07/Apr/20 18:05
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-610537944
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 417859)
Time Spent: 7.5h  (was: 7h 20m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417848&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417848
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 07/Apr/20 17:49
Start Date: 07/Apr/20 17:49
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-610530293
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 417848)
Time Spent: 7h 20m  (was: 7h 10m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417847
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 07/Apr/20 17:49
Start Date: 07/Apr/20 17:49
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-610530129
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 417847)
Time Spent: 7h 10m  (was: 7h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417789
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 07/Apr/20 16:13
Start Date: 07/Apr/20 16:13
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #11205: [BEAM-9578] Enumerating 
artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-610480293
 
 
   Since this continues to affect many Beam developers with test timeouts, do 
you think we could commit an intermediate fix for this? Or perhaps revert the 
experimental feature until a fix has been developed?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 417789)
Time Spent: 7h  (was: 6h 50m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417158&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417158
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 06/Apr/20 21:57
Start Date: 06/Apr/20 21:57
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-610058685
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 417158)
Time Spent: 6h 50m  (was: 6h 40m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417157&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417157
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 06/Apr/20 21:55
Start Date: 06/Apr/20 21:55
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r404412636
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.List;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists;
+
+/**
+ * A default artifact resolver. This resolver applies {@link ResolutionFn} in 
the reversed order
+ * they registered i.e. the function registered later overrides the earlier 
one if they resolve the
+ * same artifact.
+ */
+public class DefaultArtifactResolver implements ArtifactResolver {
+  public static final ArtifactResolver INSTANCE = new 
DefaultArtifactResolver();
+
+  private List fns =
+  Lists.newArrayList(
+  (info) -> {
+if 
(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn()))
 {
+  return ImmutableList.of(info);
+} else {
+  return ImmutableList.of();
 
 Review comment:
   Done. Optional list makes three choices: Failure, Success with empty output 
and Success with a list of artifacts. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 417157)
Time Spent: 6h 40m  (was: 6.5h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417156&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417156
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 06/Apr/20 21:54
Start Date: 06/Apr/20 21:54
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r404411951
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
 ##
 @@ -214,24 +220,90 @@ public static Environment createProcessEnvironment(
   pathsToStage.addAll(stagingFiles);
 }
 
-ImmutableList.Builder filesToStage = 
ImmutableList.builder();
+ImmutableList.Builder> lazyArtifactsBuilder =
+ImmutableList.builder();
 for (String path : pathsToStage) {
 
 Review comment:
   This for loop is fairly cheap and from stream of Suppliers we can easily get 
additional performance benefits by creating parallelStream. When we consider 
parallelizing expensive computations, some boilerplate codes are needed anyway 
in `getNonDeferredArtifacts()`. I think building a stream is a nice way to 
abstract them out.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 417156)
Time Spent: 6.5h  (was: 6h 20m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415882&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415882
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 04/Apr/20 01:44
Start Date: 04/Apr/20 01:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r403407082
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1206,6 +1210,15 @@ message MavenPayload {
   string repository_url = 2;
 }
 
+message DeferredArtifactPayload {
+  // A unique string identifier assigned by the creator of this payload. The 
creator may use this key to confirm
+  // whether they can parse the data.
+  string key = 1;
 
 Review comment:
   This is going to have to get revamped for XLang and since it isn't being 
exported outside of the SDK for portable runners we can easily change it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415882)
Time Spent: 6h 20m  (was: 6h 10m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415880&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415880
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 04/Apr/20 01:43
Start Date: 04/Apr/20 01:43
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-608952454
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415880)
Time Spent: 6h 10m  (was: 6h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415873&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415873
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 04/Apr/20 01:31
Start Date: 04/Apr/20 01:31
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r403403766
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1206,6 +1210,15 @@ message MavenPayload {
   string repository_url = 2;
 }
 
+message DeferredArtifactPayload {
+  // A unique string identifier assigned by the creator of this payload. The 
creator may use this key to confirm
+  // whether they can parse the data.
+  string key = 1;
 
 Review comment:
   Should this be uid? Any collisions here could be bad...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415873)
Time Spent: 6h  (was: 5h 50m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415874
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 04/Apr/20 01:31
Start Date: 04/Apr/20 01:31
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r403405278
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
 ##
 @@ -214,24 +220,90 @@ public static Environment createProcessEnvironment(
   pathsToStage.addAll(stagingFiles);
 }
 
-ImmutableList.Builder filesToStage = 
ImmutableList.builder();
+ImmutableList.Builder> lazyArtifactsBuilder =
+ImmutableList.builder();
 for (String path : pathsToStage) {
 
 Review comment:
   Don't we want this for loop to be lazy? 
   
   Rather than introducing intermediate streams of Suppliers, I think we could 
just rename the existing `getArtifacts()` something like 
`getNonDeferredArtifacts()` and then call it during resolution. 
   
   ```
   if (key.equals(deferredArtifactPayload.getKey())) {
 return getNonDeferredArtifacts(options);
   }
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415874)
Time Spent: 6h  (was: 5h 50m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415875&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415875
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 04/Apr/20 01:31
Start Date: 04/Apr/20 01:31
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r403403332
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.List;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists;
+
+/**
+ * A default artifact resolver. This resolver applies {@link ResolutionFn} in 
the reversed order
+ * they registered i.e. the function registered later overrides the earlier 
one if they resolve the
+ * same artifact.
+ */
+public class DefaultArtifactResolver implements ArtifactResolver {
+  public static final ArtifactResolver INSTANCE = new 
DefaultArtifactResolver();
+
+  private List fns =
+  Lists.newArrayList(
+  (info) -> {
+if 
(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn()))
 {
+  return ImmutableList.of(info);
+} else {
+  return ImmutableList.of();
 
 Review comment:
   Is the empty list special? In particular sometimes a deferred artifact may 
resolve to nothing, which is different than not being able to be resolved... I 
think we still need optional or null or an exception to denote unresolveable by 
this resolver. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415875)
Time Spent: 6h  (was: 5h 50m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415866&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415866
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 04/Apr/20 01:12
Start Date: 04/Apr/20 01:12
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r403402959
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1206,6 +1210,11 @@ message MavenPayload {
   string repository_url = 2;
 }
 
+message DeferredArtifactPayload {
+  // A id for deferred artifacts.
+  string id = 1;
 
 Review comment:
   As discussed (but putting here for the record) having a proxy artifact type 
could solve this issue. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415866)
Time Spent: 5h 50m  (was: 5h 40m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415853&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415853
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 04/Apr/20 00:03
Start Date: 04/Apr/20 00:03
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r403390422
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public class DefaultArtifactResolver implements ArtifactResolver {
+  public static ArtifactResolver INSTANCE = new DefaultArtifactResolver();
+
+  private ResolutionFn resolver =
+  (info) -> {
+if 
(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn()))
 {
+  return Optional.of(info);
+} else {
+  return Optional.empty();
+}
+  };
+
+  @Override
+  public void register(ResolutionFn fn) {
+resolver =
 
 Review comment:
   done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415853)
Time Spent: 5h 40m  (was: 5.5h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415852&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415852
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 23:59
Start Date: 03/Apr/20 23:59
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating 
artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-608857763
 
 
   @lukecwik Ready to merge. Please trigger the test and take a final look.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415852)
Time Spent: 5.5h  (was: 5h 20m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415824&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415824
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 23:22
Start Date: 03/Apr/20 23:22
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-608803659
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415824)
Time Spent: 5h 20m  (was: 5h 10m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415288
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 02:25
Start Date: 03/Apr/20 02:25
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402698752
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1206,6 +1210,11 @@ message MavenPayload {
   string repository_url = 2;
 }
 
+message DeferredArtifactPayload {
+  // A id for deferred artifacts.
+  string id = 1;
 
 Review comment:
   Thats a good point about the key and in general this will become a problem 
for all artifacts since none of them have unique keys associated with them 
unless an intermediary resolves the artifact immediately and possibly "renames" 
the contents to make it unique.
   
   If we ever want to support multiple layers of expansion for XLang we'll want 
to proxy any artifact resolution/retrieval calls through the layers and not 
require each layer to have a *copy* of the artifact.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415288)
Time Spent: 5h 10m  (was: 5h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415284&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415284
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 02:23
Start Date: 03/Apr/20 02:23
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402699096
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1206,6 +1210,15 @@ message MavenPayload {
   string repository_url = 2;
 }
 
+message DeferredArtifactPayload {
+  // A unique string identifier assigned by the creator of this payload. The 
creator may use this key to confirm
+  // whether they can parse the data.
+  string key = 1;
+
+  // A data for deferred artifacts. Interpretation of bytes is delegated to 
the creator of this payload.
 
 Review comment:
   ```suggestion
 // Data for deferred artifacts. Interpretation of bytes is delegated to 
the creator of this payload.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415284)
Time Spent: 4h 50m  (was: 4h 40m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415285&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415285
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 02:23
Start Date: 03/Apr/20 02:23
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402699398
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+/**
+ * A default artifact resolver. This resolver applies {@link ResolutionFn} 
first matched in the
+ * order they registered.
+ */
+public class DefaultArtifactResolver implements ArtifactResolver {
+  public static final ArtifactResolver INSTANCE = new 
DefaultArtifactResolver();
+
+  private ResolutionFn resolver =
+  (info) -> {
+if 
(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn()))
 {
+  return ImmutableList.of(info);
+} else {
+  return ImmutableList.of();
+}
+  };
+
+  @Override
+  public void register(ResolutionFn fn) {
+resolver =
+(info) -> {
+  List resolved = fn.resolve(info);
+  if (!resolved.isEmpty()) {
+return resolved;
+  } else {
+return resolver.resolve(info);
+  }
+};
+  }
+
+  @Override
+  public RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline) {
+ImmutableMap.Builder environmentMapBuilder =
+ImmutableMap.builder();
+for (Map.Entry entry :
+pipeline.getComponents().getEnvironmentsMap().entrySet()) {
+  List resolvedDependencies =
+  entry
+  .getValue()
+  .getDependenciesList()
+  .parallelStream()
+  .flatMap(
+  (info) -> {
+List resolved = 
resolver.resolve(info);
+if (resolved.isEmpty()) {
+  throw new RuntimeException(
+  String.format("cannot resolve artifact information: 
%s", info));
 
 Review comment:
   ```suggestion
 String.format("Cannot resolve artifact 
information: %s", info));
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415285)
Time Spent: 5h  (was: 4h 50m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/E

[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415286&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415286
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 02:23
Start Date: 03/Apr/20 02:23
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402700601
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
 ##
 @@ -214,24 +220,87 @@ public static Environment createProcessEnvironment(
   pathsToStage.addAll(stagingFiles);
 }
 
-ImmutableList.Builder filesToStage = 
ImmutableList.builder();
+ImmutableList.Builder> lazyArtifactsBuilder =
+ImmutableList.builder();
 for (String path : pathsToStage) {
   File file = new File(path);
-  if (new File(path).exists()) {
-// Spurious items get added to the classpath. Filter by just those 
that exist.
-if (file.isDirectory()) {
-  // Zip up directories so we can upload them to the artifact service.
-  try {
-filesToStage.add(createArtifactInformation(zipDirectory(file)));
-  } catch (IOException e) {
-throw new RuntimeException(e);
-  }
-} else {
-  filesToStage.add(createArtifactInformation(file));
-}
+  // Spurious items get added to the classpath. Filter by just those that 
exist.
+  if (file.exists()) {
+ArtifactInformation.Builder artifactBuilder = 
ArtifactInformation.newBuilder();
+
artifactBuilder.setTypeUrn(BeamUrns.getUrn(StandardArtifacts.Types.FILE));
+
artifactBuilder.setRoleUrn(BeamUrns.getUrn(StandardArtifacts.Roles.STAGING_TO));
+artifactBuilder.setRolePayload(
+RunnerApi.ArtifactStagingToRolePayload.newBuilder()
+.setStagedName(createStagingFileName(file))
+.build()
+.toByteString());
+lazyArtifactsBuilder.add(
+file.isDirectory()
+? () -> {
+  File zippedFile;
+  HashCode hashCode;
+  try {
+zippedFile = zipDirectory(file);
+hashCode = 
Files.asByteSource(zippedFile).hash(Hashing.sha256());
+  } catch (IOException e) {
+throw new RuntimeException(e);
+  }
+  return artifactBuilder
+  .setTypePayload(
+  RunnerApi.ArtifactFilePayload.newBuilder()
+  .setPath(zippedFile.getPath())
+  .setSha256(hashCode.toString())
+  .build()
+  .toByteString())
+  .build();
+}
+: () -> {
+  HashCode hashCode;
+  try {
+hashCode = Files.asByteSource(file).hash(Hashing.sha256());
+  } catch (IOException e) {
+throw new RuntimeException(e);
+  }
+  return artifactBuilder
+  .setTypePayload(
+  RunnerApi.ArtifactFilePayload.newBuilder()
+  .setPath(file.getPath())
+  .setSha256(hashCode.toString())
+  .build()
+  .toByteString())
+  .build();
+});
   }
 }
-return filesToStage.build();
+
+List> lazyArtifacts = 
lazyArtifactsBuilder.build();
+String id = UUID.randomUUID().toString();
+DefaultArtifactResolver.INSTANCE.register(
+(info) -> {
+  if 
(BeamUrns.getUrn(StandardArtifacts.Types.DEFERRED).equals(info.getTypeUrn())) {
+RunnerApi.DeferredArtifactPayload deferredArtifactPayload;
+try {
+  deferredArtifactPayload =
+  
RunnerApi.DeferredArtifactPayload.parseFrom(info.getTypePayload());
+} catch (InvalidProtocolBufferException e) {
+  throw new RuntimeException("Error parsing deferred artifact 
payload.", e);
+}
+if (id.equals(deferredArtifactPayload.getKey())) {
+  return 
lazyArtifacts.stream().map(Supplier::get).collect(Collectors.toList());
+} else {
+  return ImmutableList.of();
+}
+  } else {
+return ImmutableList.of();
+  }
+});
+
+return ImmutableList.of(
+ArtifactInformation.newBuilder()
+.setTypeUrn(BeamUrns.getUrn(StandardArtifacts.Types.DEFERRED))
+.setT

[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415281&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415281
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 02:16
Start Date: 03/Apr/20 02:16
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402697023
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1206,6 +1210,15 @@ message MavenPayload {
   string repository_url = 2;
 }
 
+message DeferredArtifactPayload {
+  // A unique string identifier assigned by the creator of this payload. The 
creator may use this key to confirm
+  // whether they can parse the data.
+  string key = 1;
+
+  // A data for deferred artifacts. Interpretation of bytes is delegated to 
the creator of this payload.
+  bytes data = 2;
 
 Review comment:
   ```suggestion
 bytes payload = 2;
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415281)
Time Spent: 4h 40m  (was: 4.5h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415280
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 02:15
Start Date: 03/Apr/20 02:15
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402698752
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1206,6 +1210,11 @@ message MavenPayload {
   string repository_url = 2;
 }
 
+message DeferredArtifactPayload {
+  // A id for deferred artifacts.
+  string id = 1;
 
 Review comment:
   Thats a good point about the key and in general this will become a problem 
for all artifacts since none of them have unique keys associated with them 
unless an intermediary resolves the artifact immediately and possibly "renames" 
the contents to make it unique.
   
   If we ever want to support multiple layers of expansion for XLang we'll want 
to proxy any artifact resolution/retrieval calls through the layers.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415280)
Time Spent: 4.5h  (was: 4h 20m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415279&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415279
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 02:15
Start Date: 03/Apr/20 02:15
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402697023
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1206,6 +1210,15 @@ message MavenPayload {
   string repository_url = 2;
 }
 
+message DeferredArtifactPayload {
+  // A unique string identifier assigned by the creator of this payload. The 
creator may use this key to confirm
+  // whether they can parse the data.
+  string key = 1;
+
+  // A data for deferred artifacts. Interpretation of bytes is delegated to 
the creator of this payload.
+  bytes data = 2;
 
 Review comment:
   ```suggestion
 bytes payload = 2;
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415279)
Time Spent: 4h 20m  (was: 4h 10m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415258&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415258
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 01:31
Start Date: 03/Apr/20 01:31
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402688050
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java
 ##
 @@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Optional;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public interface ArtifactResolver {
+  void register(ResolutionFn fn);
+
+  RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline);
+
+  interface ResolutionFn {
+Optional 
resolve(RunnerApi.ArtifactInformation info);
 
 Review comment:
   changed to List.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415258)
Time Spent: 4h 10m  (was: 4h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415255&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415255
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 01:23
Start Date: 03/Apr/20 01:23
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402685778
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1206,6 +1210,11 @@ message MavenPayload {
   string repository_url = 2;
 }
 
+message DeferredArtifactPayload {
+  // A id for deferred artifacts.
+  string id = 1;
 
 Review comment:
   Don't we need at least a key field (or urn or identifier) which can be used 
to check whether the payload is parsable by the creator? Otherwise, it would be 
pretty hard to know where the bytes payload originally came from.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415255)
Time Spent: 4h  (was: 3h 50m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415248&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415248
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 01:18
Start Date: 03/Apr/20 01:18
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402684550
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
 ##
 @@ -215,22 +221,76 @@ public static Environment createProcessEnvironment(
 }
 
 ImmutableList.Builder filesToStage = 
ImmutableList.builder();
+ImmutableMap.Builder>
+lazyArtifactsBuilder = ImmutableMap.builder();
 for (String path : pathsToStage) {
   File file = new File(path);
-  if (new File(path).exists()) {
-// Spurious items get added to the classpath. Filter by just those 
that exist.
-if (file.isDirectory()) {
-  // Zip up directories so we can upload them to the artifact service.
-  try {
-filesToStage.add(createArtifactInformation(zipDirectory(file)));
-  } catch (IOException e) {
-throw new RuntimeException(e);
-  }
-} else {
-  filesToStage.add(createArtifactInformation(file));
-}
+  // Spurious items get added to the classpath. Filter by just those that 
exist.
+  if (file.exists()) {
+String id = UUID.randomUUID().toString();
 
 Review comment:
   done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415248)
Time Spent: 3h 50m  (was: 3h 40m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415247&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415247
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 01:18
Start Date: 03/Apr/20 01:18
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402684484
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public class DefaultArtifactResolver implements ArtifactResolver {
+  public static ArtifactResolver INSTANCE = new DefaultArtifactResolver();
+
+  private ResolutionFn resolver =
+  (info) -> {
+if 
(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn()))
 {
+  return Optional.of(info);
+} else {
+  return Optional.empty();
+}
+  };
+
+  @Override
+  public void register(ResolutionFn fn) {
+resolver =
+(info) -> {
+  Optional resolved = fn.resolve(info);
+  if (resolved.isPresent()) {
+return resolved;
+  } else {
+return resolver.resolve(info);
+  }
+};
+  }
+
+  @Override
+  public RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline) {
 
 Review comment:
   done :smile:
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415247)
Time Spent: 3h 40m  (was: 3.5h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415246&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415246
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 01:18
Start Date: 03/Apr/20 01:18
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402684430
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public class DefaultArtifactResolver implements ArtifactResolver {
 
 Review comment:
   done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415246)
Time Spent: 3.5h  (was: 3h 20m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415245&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415245
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 03/Apr/20 01:18
Start Date: 03/Apr/20 01:18
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402684405
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java
 ##
 @@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Optional;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public interface ArtifactResolver {
 
 Review comment:
   done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 415245)
Time Spent: 3h 20m  (was: 3h 10m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414987
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 19:20
Start Date: 02/Apr/20 19:20
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-608055038
 
 
   > Friendly question: is this PR close? I'm still having to trigger Java 
precommits 3-4 times in order to get a green run, partially due to these 
timeouts.
   
   We have agreed on the solution and this PR represents an early version of it 
so I would say that we are close.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414987)
Time Spent: 3h 10m  (was: 3h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414961
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 18:37
Start Date: 02/Apr/20 18:37
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-608034034
 
 
   Friendly question: is this PR close? I'm still having to trigger Java 
precommits 3-4 times in order to get a green run, partially due to these 
timeouts.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414961)
Time Spent: 3h  (was: 2h 50m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414922
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 17:50
Start Date: 02/Apr/20 17:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402501898
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
 ##
 @@ -215,22 +221,76 @@ public static Environment createProcessEnvironment(
 }
 
 ImmutableList.Builder filesToStage = 
ImmutableList.builder();
+ImmutableMap.Builder>
+lazyArtifactsBuilder = ImmutableMap.builder();
 for (String path : pathsToStage) {
   File file = new File(path);
-  if (new File(path).exists()) {
-// Spurious items get added to the classpath. Filter by just those 
that exist.
-if (file.isDirectory()) {
-  // Zip up directories so we can upload them to the artifact service.
-  try {
-filesToStage.add(createArtifactInformation(zipDirectory(file)));
-  } catch (IOException e) {
-throw new RuntimeException(e);
-  }
-} else {
-  filesToStage.add(createArtifactInformation(file));
-}
+  // Spurious items get added to the classpath. Filter by just those that 
exist.
+  if (file.exists()) {
+String id = UUID.randomUUID().toString();
 
 Review comment:
   Yes, I think the goal is to avoid even this enumeration until we actually 
need it. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414922)
Time Spent: 2h 40m  (was: 2.5h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414923&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414923
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 17:50
Start Date: 02/Apr/20 17:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402499335
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java
 ##
 @@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Optional;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public interface ArtifactResolver {
+  void register(ResolutionFn fn);
+
+  RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline);
+
+  interface ResolutionFn {
+Optional 
resolve(RunnerApi.ArtifactInformation info);
 
 Review comment:
   As with the resolution API, one may want to attempt to resolve multiple 
artifacts (e.g. maven dependencies) simultaneously. One may also need to return 
multiple artifacts as the resolution of a single artifact (e.g. the deferred 
"ambient environment" one). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414923)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414921&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414921
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 17:50
Start Date: 02/Apr/20 17:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402500648
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public class DefaultArtifactResolver implements ArtifactResolver {
+  public static ArtifactResolver INSTANCE = new DefaultArtifactResolver();
+
+  private ResolutionFn resolver =
+  (info) -> {
+if 
(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn()))
 {
+  return Optional.of(info);
+} else {
+  return Optional.empty();
+}
+  };
+
+  @Override
+  public void register(ResolutionFn fn) {
+resolver =
 
 Review comment:
   I wonder if having an explicit List would be easier to 
understand than the implicit chaining in these abstract classes. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414921)
Time Spent: 2h 40m  (was: 2.5h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414924&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414924
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 17:50
Start Date: 02/Apr/20 17:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402494768
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java
 ##
 @@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Optional;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public interface ArtifactResolver {
+  void register(ResolutionFn fn);
 
 Review comment:
   I agree. However, this might be more limited. E.g. one could have a more 
than one resolver per type, each of which can only resolve a subset, or 
alternatively one could want to resolve more than one type (or any type), e.g. 
a proxying resolver. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414924)
Time Spent: 2h 50m  (was: 2h 40m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414920&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414920
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 17:50
Start Date: 02/Apr/20 17:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402492044
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1206,6 +1210,11 @@ message MavenPayload {
   string repository_url = 2;
 }
 
+message DeferredArtifactPayload {
+  // A id for deferred artifacts.
+  string id = 1;
 
 Review comment:
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414920)
Time Spent: 2.5h  (was: 2h 20m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414896&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414896
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 17:27
Start Date: 02/Apr/20 17:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402479848
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public class DefaultArtifactResolver implements ArtifactResolver {
 
 Review comment:
   Class comment
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414896)
Time Spent: 2h 20m  (was: 2h 10m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414898&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414898
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 17:27
Start Date: 02/Apr/20 17:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402482365
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public class DefaultArtifactResolver implements ArtifactResolver {
+  public static ArtifactResolver INSTANCE = new DefaultArtifactResolver();
+
+  private ResolutionFn resolver =
+  (info) -> {
+if 
(BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn()))
 {
+  return Optional.of(info);
+} else {
+  return Optional.empty();
+}
+  };
+
+  @Override
+  public void register(ResolutionFn fn) {
+resolver =
+(info) -> {
+  Optional resolved = fn.resolve(info);
+  if (resolved.isPresent()) {
+return resolved;
+  } else {
+return resolver.resolve(info);
+  }
+};
+  }
+
+  @Override
+  public RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline) {
 
 Review comment:
   The level of nesting in this method is getting a little silly.
   
   Use local variables to logically describe what your doing and consider 
dropping using stream
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414898)
Time Spent: 2h 20m  (was: 2h 10m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414897
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 17:27
Start Date: 02/Apr/20 17:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402482918
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java
 ##
 @@ -215,22 +221,76 @@ public static Environment createProcessEnvironment(
 }
 
 ImmutableList.Builder filesToStage = 
ImmutableList.builder();
+ImmutableMap.Builder>
+lazyArtifactsBuilder = ImmutableMap.builder();
 for (String path : pathsToStage) {
   File file = new File(path);
-  if (new File(path).exists()) {
-// Spurious items get added to the classpath. Filter by just those 
that exist.
-if (file.isDirectory()) {
-  // Zip up directories so we can upload them to the artifact service.
-  try {
-filesToStage.add(createArtifactInformation(zipDirectory(file)));
-  } catch (IOException e) {
-throw new RuntimeException(e);
-  }
-} else {
-  filesToStage.add(createArtifactInformation(file));
-}
+  // Spurious items get added to the classpath. Filter by just those that 
exist.
+  if (file.exists()) {
+String id = UUID.randomUUID().toString();
 
 Review comment:
   Note that you could put one ID into the map for the entire list of files if 
you allowed ResolutionFn to return a List/Collection of artifacts.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414897)
Time Spent: 2h 20m  (was: 2h 10m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414895&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414895
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 17:27
Start Date: 02/Apr/20 17:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402477389
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java
 ##
 @@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Optional;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public interface ArtifactResolver {
 
 Review comment:
   Please add comments to this class and methods.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414895)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414894
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 17:27
Start Date: 02/Apr/20 17:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402476502
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java
 ##
 @@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import java.util.Optional;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+public interface ArtifactResolver {
+  void register(ResolutionFn fn);
 
 Review comment:
   We won't need to rely on using Optional if we make registration take a URN 
and ResolutionFn and then the resolver can be found by URN.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414894)
Time Spent: 2h 10m  (was: 2h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414893
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 02/Apr/20 17:27
Start Date: 02/Apr/20 17:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r402474608
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1206,6 +1210,11 @@ message MavenPayload {
   string repository_url = 2;
 }
 
+message DeferredArtifactPayload {
+  // A id for deferred artifacts.
+  string id = 1;
 
 Review comment:
   I was under the impression we were going to make this a `bytes` field so 
that any deferred information can get passed through and then back to the 
**creator** whether it be an id or a serialized blob of objects or ...
   
   Allowing for `bytes` enables for solutions beyond in memory maps.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414893)
Time Spent: 2h 10m  (was: 2h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414370&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414370
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 01/Apr/20 23:19
Start Date: 01/Apr/20 23:19
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating 
artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-607536354
 
 
   CC: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414370)
Time Spent: 2h  (was: 1h 50m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414368
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 01/Apr/20 23:13
Start Date: 01/Apr/20 23:13
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating 
artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-607534661
 
 
   CC: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414368)
Time Spent: 1h 50m  (was: 1h 40m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414307&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414307
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 01/Apr/20 21:49
Start Date: 01/Apr/20 21:49
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r401930784
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1146,33 +1146,41 @@ message StandardArtifacts {
   enum Types {
 // A URN for locally-accessible artifact files.
 // payload: ArtifactFilePayload
-FILE = 0 [(beam_urn) = "beam:artifact:type:file:v1"];
+FILE  = 0 [(beam_urn) = "beam:artifact:type:file:v1"];
 
 // A URN for artifacts described by URLs.
 // payload: ArtifactUrlPayload
-URL  = 1 [(beam_urn) = "beam:artifact:type:url:v1"];
+URL   = 1 [(beam_urn) = "beam:artifact:type:url:v1"];
 
 // A URN for artifacts embedded in ArtifactInformation proto.
 // payload: EmbeddedFilePayload.
-EMBEDDED = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"];
+EMBEDDED  = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"];
 
 // A URN for Python artifacts hosted on PYPI.
 // payload: PypiPayload
-PYPI = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"];
+PYPI  = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"];
 
 // A URN for Java artifacts hosted on a Maven repository.
 // payload: MavenPayload
-MAVEN= 4 [(beam_urn) = "beam:artifact:type:maven:v1"];
+MAVEN = 4 [(beam_urn) = "beam:artifact:type:maven:v1"];
+
+// A URN for locally-accessible artifact directory.
+// payload: ArtifactDirectoryPayload
+DIRECTORY = 5 [(beam_urn) = "beam:artifact:type:directory:v1"];
   }
   enum Roles {
 // A URN for staging-to role.
 // payload: ArtifactStagingToRolePayload
 STAGING_TO  = 0 [(beam_urn) = "beam:artifact:role:staging_to:v1"];
+
+// A URN for unzip-to role.
+// payload: ArtifactUnzipToRolePayload
+UNZIP_TO= 1 [(beam_urn) = "beam:artifact:role:unzip_to:v1"];
   }
 }
 
 message ArtifactFilePayload {
-  // a string for an artifact path e.g. "/tmp/foo.jar"
+  // a string for an artifact file path e.g. "/tmp/foo.jar"
   string path = 1;
 
   // The hex-encoded sha256 checksum of the artifact.
 
 Review comment:
   adding deferred artifacts and populating sha256.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 414307)
Time Spent: 1h 40m  (was: 1.5h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-03-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=412540&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-412540
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 30/Mar/20 20:02
Start Date: 30/Mar/20 20:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#discussion_r400458940
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1146,33 +1146,41 @@ message StandardArtifacts {
   enum Types {
 // A URN for locally-accessible artifact files.
 // payload: ArtifactFilePayload
-FILE = 0 [(beam_urn) = "beam:artifact:type:file:v1"];
+FILE  = 0 [(beam_urn) = "beam:artifact:type:file:v1"];
 
 // A URN for artifacts described by URLs.
 // payload: ArtifactUrlPayload
-URL  = 1 [(beam_urn) = "beam:artifact:type:url:v1"];
+URL   = 1 [(beam_urn) = "beam:artifact:type:url:v1"];
 
 // A URN for artifacts embedded in ArtifactInformation proto.
 // payload: EmbeddedFilePayload.
-EMBEDDED = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"];
+EMBEDDED  = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"];
 
 // A URN for Python artifacts hosted on PYPI.
 // payload: PypiPayload
-PYPI = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"];
+PYPI  = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"];
 
 // A URN for Java artifacts hosted on a Maven repository.
 // payload: MavenPayload
-MAVEN= 4 [(beam_urn) = "beam:artifact:type:maven:v1"];
+MAVEN = 4 [(beam_urn) = "beam:artifact:type:maven:v1"];
+
+// A URN for locally-accessible artifact directory.
+// payload: ArtifactDirectoryPayload
+DIRECTORY = 5 [(beam_urn) = "beam:artifact:type:directory:v1"];
   }
   enum Roles {
 // A URN for staging-to role.
 // payload: ArtifactStagingToRolePayload
 STAGING_TO  = 0 [(beam_urn) = "beam:artifact:role:staging_to:v1"];
+
+// A URN for unzip-to role.
+// payload: ArtifactUnzipToRolePayload
+UNZIP_TO= 1 [(beam_urn) = "beam:artifact:role:unzip_to:v1"];
   }
 }
 
 message ArtifactFilePayload {
-  // a string for an artifact path e.g. "/tmp/foo.jar"
+  // a string for an artifact file path e.g. "/tmp/foo.jar"
   string path = 1;
 
   // The hex-encoded sha256 checksum of the artifact.
 
 Review comment:
   It doesn't seem like we are populating sha256 here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 412540)
Time Spent: 1.5h  (was: 1h 20m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-03-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=410566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410566
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 26/Mar/20 19:59
Start Date: 26/Mar/20 19:59
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating 
artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-604654049
 
 
   R: @lukecwik
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 410566)
Time Spent: 1h 20m  (was: 1h 10m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-03-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=409734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-409734
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 25/Mar/20 19:04
Start Date: 25/Mar/20 19:04
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating 
artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-604028675
 
 
   directory type and unzip_to role added. didn't create a interface for 
resolving artifacts since we only have one (resolving directory) now. maybe 
refactor later for supporting more resolvers.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 409734)
Time Spent: 1h 10m  (was: 1h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408959&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408959
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 24/Mar/20 17:34
Start Date: 24/Mar/20 17:34
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-603394246
 
 
   In this case, these are directories to be added to the classpath, so it may 
be preferable to keep them zipped, though such a role does make sense. (This is 
also an example of a new "directory" type that could be "resolved" to a 
zipfile.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 408959)
Time Spent: 1h  (was: 50m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408842&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408842
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 24/Mar/20 15:14
Start Date: 24/Mar/20 15:14
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-603297904
 
 
   Should we add a `directory` option and role that would be to `unzip`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 408842)
Time Spent: 50m  (was: 40m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408841&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408841
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 24/Mar/20 15:13
Start Date: 24/Mar/20 15:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-603297142
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 408841)
Time Spent: 40m  (was: 0.5h)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408660&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408660
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 24/Mar/20 10:46
Start Date: 24/Mar/20 10:46
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-603165704
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 408660)
Time Spent: 0.5h  (was: 20m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408659&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408659
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 24/Mar/20 10:45
Start Date: 24/Mar/20 10:45
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205#issuecomment-603165704
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 408659)
Time Spent: 20m  (was: 10m)

> Enumerating artifacts is too expensive in Java
> --
>
> Key: BEAM-9578
> URL: https://issues.apache.org/jira/browse/BEAM-9578
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Heejong Lee
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which 
> effectively call *Environments#createOrGetDefaultEnvironment* which causes 
> [artifacts to be 
> computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114].
> This leads to zipping directories for non-jar dependencies.
> Similar problems may exist for Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java

2020-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408619
 ]

ASF GitHub Bot logged work on BEAM-9578:


Author: ASF GitHub Bot
Created on: 24/Mar/20 09:36
Start Date: 24/Mar/20 09:36
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11205: [BEAM-9578] 
Enumerating artifacts is too expensive in Java
URL: https://github.com/apache/beam/pull/11205
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [