[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=418010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418010 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 07/Apr/20 21:57 Start Date: 07/Apr/20 21:57 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 418010) Time Spent: 8h 10m (was: 8h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 8h 10m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=418008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418008 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 07/Apr/20 21:56 Start Date: 07/Apr/20 21:56 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r405136162 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java ## @@ -207,31 +210,94 @@ public static Environment createProcessEnvironment( } } - public static Collection getArtifacts(PipelineOptions options) { -Set pathsToStage = Sets.newHashSet(); -List stagingFiles = options.as(PortablePipelineOptions.class).getFilesToStage(); -if (stagingFiles != null) { - pathsToStage.addAll(stagingFiles); -} - -ImmutableList.Builder filesToStage = ImmutableList.builder(); + private static List getArtifacts(List stagingFiles) { +Set pathsToStage = Sets.newHashSet(stagingFiles); +ImmutableList.Builder artifactsBuilder = ImmutableList.builder(); for (String path : pathsToStage) { File file = new File(path); - if (new File(path).exists()) { -// Spurious items get added to the classpath. Filter by just those that exist. + // Spurious items get added to the classpath. Filter by just those that exist. + if (file.exists()) { +ArtifactInformation.Builder artifactBuilder = ArtifactInformation.newBuilder(); + artifactBuilder.setTypeUrn(BeamUrns.getUrn(StandardArtifacts.Types.FILE)); + artifactBuilder.setRoleUrn(BeamUrns.getUrn(StandardArtifacts.Roles.STAGING_TO)); +artifactBuilder.setRolePayload( +RunnerApi.ArtifactStagingToRolePayload.newBuilder() +.setStagedName(createStagingFileName(file)) +.build() +.toByteString()); if (file.isDirectory()) { - // Zip up directories so we can upload them to the artifact service. + File zippedFile; Review comment: Nit, there seems to be a fair amount of duplication between these two case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 418008) Time Spent: 7h 50m (was: 7h 40m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=418009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-418009 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 07/Apr/20 21:56 Start Date: 07/Apr/20 21:56 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r405136418 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java ## @@ -207,31 +210,94 @@ public static Environment createProcessEnvironment( } } - public static Collection getArtifacts(PipelineOptions options) { -Set pathsToStage = Sets.newHashSet(); -List stagingFiles = options.as(PortablePipelineOptions.class).getFilesToStage(); -if (stagingFiles != null) { - pathsToStage.addAll(stagingFiles); -} - -ImmutableList.Builder filesToStage = ImmutableList.builder(); + private static List getArtifacts(List stagingFiles) { +Set pathsToStage = Sets.newHashSet(stagingFiles); +ImmutableList.Builder artifactsBuilder = ImmutableList.builder(); for (String path : pathsToStage) { File file = new File(path); - if (new File(path).exists()) { -// Spurious items get added to the classpath. Filter by just those that exist. + // Spurious items get added to the classpath. Filter by just those that exist. + if (file.exists()) { +ArtifactInformation.Builder artifactBuilder = ArtifactInformation.newBuilder(); + artifactBuilder.setTypeUrn(BeamUrns.getUrn(StandardArtifacts.Types.FILE)); + artifactBuilder.setRoleUrn(BeamUrns.getUrn(StandardArtifacts.Roles.STAGING_TO)); +artifactBuilder.setRolePayload( +RunnerApi.ArtifactStagingToRolePayload.newBuilder() +.setStagedName(createStagingFileName(file)) +.build() +.toByteString()); if (file.isDirectory()) { - // Zip up directories so we can upload them to the artifact service. + File zippedFile; + HashCode hashCode; try { -filesToStage.add(createArtifactInformation(zipDirectory(file))); +zippedFile = zipDirectory(file); +hashCode = Files.asByteSource(zippedFile).hash(Hashing.sha256()); } catch (IOException e) { throw new RuntimeException(e); } + artifactsBuilder.add( + artifactBuilder + .setTypePayload( + RunnerApi.ArtifactFilePayload.newBuilder() + .setPath(zippedFile.getPath()) + .setSha256(hashCode.toString()) + .build() + .toByteString()) + .build()); } else { - filesToStage.add(createArtifactInformation(file)); + HashCode hashCode; + try { +hashCode = Files.asByteSource(file).hash(Hashing.sha256()); + } catch (IOException e) { +throw new RuntimeException(e); Review comment: Or would it be better to let the method throw an IOException? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 418009) Time Spent: 8h (was: 7h 50m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 8h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417967 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 07/Apr/20 20:45 Start Date: 07/Apr/20 20:45 Worklog Time Spent: 10m Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-610611127 @robertwb All comments are addressed. PTAL! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 417967) Time Spent: 7h 40m (was: 7.5h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 7h 40m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417859&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417859 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 07/Apr/20 18:05 Start Date: 07/Apr/20 18:05 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-610537944 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 417859) Time Spent: 7.5h (was: 7h 20m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 7.5h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417848&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417848 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 07/Apr/20 17:49 Start Date: 07/Apr/20 17:49 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-610530293 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 417848) Time Spent: 7h 20m (was: 7h 10m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 7h 20m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417847 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 07/Apr/20 17:49 Start Date: 07/Apr/20 17:49 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-610530129 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 417847) Time Spent: 7h 10m (was: 7h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 7h 10m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417789 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 07/Apr/20 16:13 Start Date: 07/Apr/20 16:13 Worklog Time Spent: 10m Work Description: mxm commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-610480293 Since this continues to affect many Beam developers with test timeouts, do you think we could commit an intermediate fix for this? Or perhaps revert the experimental feature until a fix has been developed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 417789) Time Spent: 7h (was: 6h 50m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 7h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417158&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417158 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 06/Apr/20 21:57 Start Date: 06/Apr/20 21:57 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-610058685 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 417158) Time Spent: 6h 50m (was: 6h 40m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417157&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417157 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 06/Apr/20 21:55 Start Date: 06/Apr/20 21:55 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r404412636 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.List; +import java.util.Map; +import java.util.function.Function; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; + +/** + * A default artifact resolver. This resolver applies {@link ResolutionFn} in the reversed order + * they registered i.e. the function registered later overrides the earlier one if they resolve the + * same artifact. + */ +public class DefaultArtifactResolver implements ArtifactResolver { + public static final ArtifactResolver INSTANCE = new DefaultArtifactResolver(); + + private List fns = + Lists.newArrayList( + (info) -> { +if (BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) { + return ImmutableList.of(info); +} else { + return ImmutableList.of(); Review comment: Done. Optional list makes three choices: Failure, Success with empty output and Success with a list of artifacts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 417157) Time Spent: 6h 40m (was: 6.5h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 6h 40m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=417156&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-417156 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 06/Apr/20 21:54 Start Date: 06/Apr/20 21:54 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r404411951 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java ## @@ -214,24 +220,90 @@ public static Environment createProcessEnvironment( pathsToStage.addAll(stagingFiles); } -ImmutableList.Builder filesToStage = ImmutableList.builder(); +ImmutableList.Builder> lazyArtifactsBuilder = +ImmutableList.builder(); for (String path : pathsToStage) { Review comment: This for loop is fairly cheap and from stream of Suppliers we can easily get additional performance benefits by creating parallelStream. When we consider parallelizing expensive computations, some boilerplate codes are needed anyway in `getNonDeferredArtifacts()`. I think building a stream is a nice way to abstract them out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 417156) Time Spent: 6.5h (was: 6h 20m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415882&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415882 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 04/Apr/20 01:44 Start Date: 04/Apr/20 01:44 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r403407082 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1206,6 +1210,15 @@ message MavenPayload { string repository_url = 2; } +message DeferredArtifactPayload { + // A unique string identifier assigned by the creator of this payload. The creator may use this key to confirm + // whether they can parse the data. + string key = 1; Review comment: This is going to have to get revamped for XLang and since it isn't being exported outside of the SDK for portable runners we can easily change it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415882) Time Spent: 6h 20m (was: 6h 10m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415880&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415880 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 04/Apr/20 01:43 Start Date: 04/Apr/20 01:43 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-608952454 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415880) Time Spent: 6h 10m (was: 6h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415873&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415873 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 04/Apr/20 01:31 Start Date: 04/Apr/20 01:31 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r403403766 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1206,6 +1210,15 @@ message MavenPayload { string repository_url = 2; } +message DeferredArtifactPayload { + // A unique string identifier assigned by the creator of this payload. The creator may use this key to confirm + // whether they can parse the data. + string key = 1; Review comment: Should this be uid? Any collisions here could be bad... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415873) Time Spent: 6h (was: 5h 50m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 6h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415874 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 04/Apr/20 01:31 Start Date: 04/Apr/20 01:31 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r403405278 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java ## @@ -214,24 +220,90 @@ public static Environment createProcessEnvironment( pathsToStage.addAll(stagingFiles); } -ImmutableList.Builder filesToStage = ImmutableList.builder(); +ImmutableList.Builder> lazyArtifactsBuilder = +ImmutableList.builder(); for (String path : pathsToStage) { Review comment: Don't we want this for loop to be lazy? Rather than introducing intermediate streams of Suppliers, I think we could just rename the existing `getArtifacts()` something like `getNonDeferredArtifacts()` and then call it during resolution. ``` if (key.equals(deferredArtifactPayload.getKey())) { return getNonDeferredArtifacts(options); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415874) Time Spent: 6h (was: 5h 50m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 6h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415875&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415875 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 04/Apr/20 01:31 Start Date: 04/Apr/20 01:31 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r403403332 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.List; +import java.util.Map; +import java.util.function.Function; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists; + +/** + * A default artifact resolver. This resolver applies {@link ResolutionFn} in the reversed order + * they registered i.e. the function registered later overrides the earlier one if they resolve the + * same artifact. + */ +public class DefaultArtifactResolver implements ArtifactResolver { + public static final ArtifactResolver INSTANCE = new DefaultArtifactResolver(); + + private List fns = + Lists.newArrayList( + (info) -> { +if (BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) { + return ImmutableList.of(info); +} else { + return ImmutableList.of(); Review comment: Is the empty list special? In particular sometimes a deferred artifact may resolve to nothing, which is different than not being able to be resolved... I think we still need optional or null or an exception to denote unresolveable by this resolver. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415875) Time Spent: 6h (was: 5h 50m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 6h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415866&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415866 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 04/Apr/20 01:12 Start Date: 04/Apr/20 01:12 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r403402959 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1206,6 +1210,11 @@ message MavenPayload { string repository_url = 2; } +message DeferredArtifactPayload { + // A id for deferred artifacts. + string id = 1; Review comment: As discussed (but putting here for the record) having a proxy artifact type could solve this issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415866) Time Spent: 5h 50m (was: 5h 40m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415853&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415853 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 04/Apr/20 00:03 Start Date: 04/Apr/20 00:03 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r403390422 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public class DefaultArtifactResolver implements ArtifactResolver { + public static ArtifactResolver INSTANCE = new DefaultArtifactResolver(); + + private ResolutionFn resolver = + (info) -> { +if (BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) { + return Optional.of(info); +} else { + return Optional.empty(); +} + }; + + @Override + public void register(ResolutionFn fn) { +resolver = Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415853) Time Spent: 5h 40m (was: 5.5h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415852&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415852 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 23:59 Start Date: 03/Apr/20 23:59 Worklog Time Spent: 10m Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-608857763 @lukecwik Ready to merge. Please trigger the test and take a final look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415852) Time Spent: 5.5h (was: 5h 20m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415824&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415824 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 23:22 Start Date: 03/Apr/20 23:22 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-608803659 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415824) Time Spent: 5h 20m (was: 5h 10m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415288 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 02:25 Start Date: 03/Apr/20 02:25 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402698752 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1206,6 +1210,11 @@ message MavenPayload { string repository_url = 2; } +message DeferredArtifactPayload { + // A id for deferred artifacts. + string id = 1; Review comment: Thats a good point about the key and in general this will become a problem for all artifacts since none of them have unique keys associated with them unless an intermediary resolves the artifact immediately and possibly "renames" the contents to make it unique. If we ever want to support multiple layers of expansion for XLang we'll want to proxy any artifact resolution/retrieval calls through the layers and not require each layer to have a *copy* of the artifact. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415288) Time Spent: 5h 10m (was: 5h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415284&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415284 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 02:23 Start Date: 03/Apr/20 02:23 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402699096 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1206,6 +1210,15 @@ message MavenPayload { string repository_url = 2; } +message DeferredArtifactPayload { + // A unique string identifier assigned by the creator of this payload. The creator may use this key to confirm + // whether they can parse the data. + string key = 1; + + // A data for deferred artifacts. Interpretation of bytes is delegated to the creator of this payload. Review comment: ```suggestion // Data for deferred artifacts. Interpretation of bytes is delegated to the creator of this payload. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415284) Time Spent: 4h 50m (was: 4h 40m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415285&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415285 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 02:23 Start Date: 03/Apr/20 02:23 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402699398 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java ## @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; + +/** + * A default artifact resolver. This resolver applies {@link ResolutionFn} first matched in the + * order they registered. + */ +public class DefaultArtifactResolver implements ArtifactResolver { + public static final ArtifactResolver INSTANCE = new DefaultArtifactResolver(); + + private ResolutionFn resolver = + (info) -> { +if (BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) { + return ImmutableList.of(info); +} else { + return ImmutableList.of(); +} + }; + + @Override + public void register(ResolutionFn fn) { +resolver = +(info) -> { + List resolved = fn.resolve(info); + if (!resolved.isEmpty()) { +return resolved; + } else { +return resolver.resolve(info); + } +}; + } + + @Override + public RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline) { +ImmutableMap.Builder environmentMapBuilder = +ImmutableMap.builder(); +for (Map.Entry entry : +pipeline.getComponents().getEnvironmentsMap().entrySet()) { + List resolvedDependencies = + entry + .getValue() + .getDependenciesList() + .parallelStream() + .flatMap( + (info) -> { +List resolved = resolver.resolve(info); +if (resolved.isEmpty()) { + throw new RuntimeException( + String.format("cannot resolve artifact information: %s", info)); Review comment: ```suggestion String.format("Cannot resolve artifact information: %s", info)); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415285) Time Spent: 5h (was: 4h 50m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 5h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/E
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415286&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415286 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 02:23 Start Date: 03/Apr/20 02:23 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402700601 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java ## @@ -214,24 +220,87 @@ public static Environment createProcessEnvironment( pathsToStage.addAll(stagingFiles); } -ImmutableList.Builder filesToStage = ImmutableList.builder(); +ImmutableList.Builder> lazyArtifactsBuilder = +ImmutableList.builder(); for (String path : pathsToStage) { File file = new File(path); - if (new File(path).exists()) { -// Spurious items get added to the classpath. Filter by just those that exist. -if (file.isDirectory()) { - // Zip up directories so we can upload them to the artifact service. - try { -filesToStage.add(createArtifactInformation(zipDirectory(file))); - } catch (IOException e) { -throw new RuntimeException(e); - } -} else { - filesToStage.add(createArtifactInformation(file)); -} + // Spurious items get added to the classpath. Filter by just those that exist. + if (file.exists()) { +ArtifactInformation.Builder artifactBuilder = ArtifactInformation.newBuilder(); + artifactBuilder.setTypeUrn(BeamUrns.getUrn(StandardArtifacts.Types.FILE)); + artifactBuilder.setRoleUrn(BeamUrns.getUrn(StandardArtifacts.Roles.STAGING_TO)); +artifactBuilder.setRolePayload( +RunnerApi.ArtifactStagingToRolePayload.newBuilder() +.setStagedName(createStagingFileName(file)) +.build() +.toByteString()); +lazyArtifactsBuilder.add( +file.isDirectory() +? () -> { + File zippedFile; + HashCode hashCode; + try { +zippedFile = zipDirectory(file); +hashCode = Files.asByteSource(zippedFile).hash(Hashing.sha256()); + } catch (IOException e) { +throw new RuntimeException(e); + } + return artifactBuilder + .setTypePayload( + RunnerApi.ArtifactFilePayload.newBuilder() + .setPath(zippedFile.getPath()) + .setSha256(hashCode.toString()) + .build() + .toByteString()) + .build(); +} +: () -> { + HashCode hashCode; + try { +hashCode = Files.asByteSource(file).hash(Hashing.sha256()); + } catch (IOException e) { +throw new RuntimeException(e); + } + return artifactBuilder + .setTypePayload( + RunnerApi.ArtifactFilePayload.newBuilder() + .setPath(file.getPath()) + .setSha256(hashCode.toString()) + .build() + .toByteString()) + .build(); +}); } } -return filesToStage.build(); + +List> lazyArtifacts = lazyArtifactsBuilder.build(); +String id = UUID.randomUUID().toString(); +DefaultArtifactResolver.INSTANCE.register( +(info) -> { + if (BeamUrns.getUrn(StandardArtifacts.Types.DEFERRED).equals(info.getTypeUrn())) { +RunnerApi.DeferredArtifactPayload deferredArtifactPayload; +try { + deferredArtifactPayload = + RunnerApi.DeferredArtifactPayload.parseFrom(info.getTypePayload()); +} catch (InvalidProtocolBufferException e) { + throw new RuntimeException("Error parsing deferred artifact payload.", e); +} +if (id.equals(deferredArtifactPayload.getKey())) { + return lazyArtifacts.stream().map(Supplier::get).collect(Collectors.toList()); +} else { + return ImmutableList.of(); +} + } else { +return ImmutableList.of(); + } +}); + +return ImmutableList.of( +ArtifactInformation.newBuilder() +.setTypeUrn(BeamUrns.getUrn(StandardArtifacts.Types.DEFERRED)) +.setT
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415281&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415281 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 02:16 Start Date: 03/Apr/20 02:16 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402697023 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1206,6 +1210,15 @@ message MavenPayload { string repository_url = 2; } +message DeferredArtifactPayload { + // A unique string identifier assigned by the creator of this payload. The creator may use this key to confirm + // whether they can parse the data. + string key = 1; + + // A data for deferred artifacts. Interpretation of bytes is delegated to the creator of this payload. + bytes data = 2; Review comment: ```suggestion bytes payload = 2; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415281) Time Spent: 4h 40m (was: 4.5h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415280 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 02:15 Start Date: 03/Apr/20 02:15 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402698752 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1206,6 +1210,11 @@ message MavenPayload { string repository_url = 2; } +message DeferredArtifactPayload { + // A id for deferred artifacts. + string id = 1; Review comment: Thats a good point about the key and in general this will become a problem for all artifacts since none of them have unique keys associated with them unless an intermediary resolves the artifact immediately and possibly "renames" the contents to make it unique. If we ever want to support multiple layers of expansion for XLang we'll want to proxy any artifact resolution/retrieval calls through the layers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415280) Time Spent: 4.5h (was: 4h 20m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415279&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415279 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 02:15 Start Date: 03/Apr/20 02:15 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402697023 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1206,6 +1210,15 @@ message MavenPayload { string repository_url = 2; } +message DeferredArtifactPayload { + // A unique string identifier assigned by the creator of this payload. The creator may use this key to confirm + // whether they can parse the data. + string key = 1; + + // A data for deferred artifacts. Interpretation of bytes is delegated to the creator of this payload. + bytes data = 2; Review comment: ```suggestion bytes payload = 2; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415279) Time Spent: 4h 20m (was: 4h 10m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415258&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415258 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 01:31 Start Date: 03/Apr/20 01:31 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402688050 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Optional; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public interface ArtifactResolver { + void register(ResolutionFn fn); + + RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline); + + interface ResolutionFn { +Optional resolve(RunnerApi.ArtifactInformation info); Review comment: changed to List. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415258) Time Spent: 4h 10m (was: 4h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415255&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415255 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 01:23 Start Date: 03/Apr/20 01:23 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402685778 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1206,6 +1210,11 @@ message MavenPayload { string repository_url = 2; } +message DeferredArtifactPayload { + // A id for deferred artifacts. + string id = 1; Review comment: Don't we need at least a key field (or urn or identifier) which can be used to check whether the payload is parsable by the creator? Otherwise, it would be pretty hard to know where the bytes payload originally came from. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415255) Time Spent: 4h (was: 3h 50m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 4h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415248&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415248 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 01:18 Start Date: 03/Apr/20 01:18 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402684550 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java ## @@ -215,22 +221,76 @@ public static Environment createProcessEnvironment( } ImmutableList.Builder filesToStage = ImmutableList.builder(); +ImmutableMap.Builder> +lazyArtifactsBuilder = ImmutableMap.builder(); for (String path : pathsToStage) { File file = new File(path); - if (new File(path).exists()) { -// Spurious items get added to the classpath. Filter by just those that exist. -if (file.isDirectory()) { - // Zip up directories so we can upload them to the artifact service. - try { -filesToStage.add(createArtifactInformation(zipDirectory(file))); - } catch (IOException e) { -throw new RuntimeException(e); - } -} else { - filesToStage.add(createArtifactInformation(file)); -} + // Spurious items get added to the classpath. Filter by just those that exist. + if (file.exists()) { +String id = UUID.randomUUID().toString(); Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415248) Time Spent: 3h 50m (was: 3h 40m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415247&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415247 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 01:18 Start Date: 03/Apr/20 01:18 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402684484 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public class DefaultArtifactResolver implements ArtifactResolver { + public static ArtifactResolver INSTANCE = new DefaultArtifactResolver(); + + private ResolutionFn resolver = + (info) -> { +if (BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) { + return Optional.of(info); +} else { + return Optional.empty(); +} + }; + + @Override + public void register(ResolutionFn fn) { +resolver = +(info) -> { + Optional resolved = fn.resolve(info); + if (resolved.isPresent()) { +return resolved; + } else { +return resolver.resolve(info); + } +}; + } + + @Override + public RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline) { Review comment: done :smile: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415247) Time Spent: 3h 40m (was: 3.5h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415246&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415246 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 01:18 Start Date: 03/Apr/20 01:18 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402684430 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public class DefaultArtifactResolver implements ArtifactResolver { Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415246) Time Spent: 3.5h (was: 3h 20m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=415245&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-415245 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 03/Apr/20 01:18 Start Date: 03/Apr/20 01:18 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402684405 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Optional; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public interface ArtifactResolver { Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 415245) Time Spent: 3h 20m (was: 3h 10m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414987 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 19:20 Start Date: 02/Apr/20 19:20 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-608055038 > Friendly question: is this PR close? I'm still having to trigger Java precommits 3-4 times in order to get a green run, partially due to these timeouts. We have agreed on the solution and this PR represents an early version of it so I would say that we are close. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414987) Time Spent: 3h 10m (was: 3h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414961 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 18:37 Start Date: 02/Apr/20 18:37 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-608034034 Friendly question: is this PR close? I'm still having to trigger Java precommits 3-4 times in order to get a green run, partially due to these timeouts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414961) Time Spent: 3h (was: 2h 50m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 3h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414922 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 17:50 Start Date: 02/Apr/20 17:50 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402501898 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java ## @@ -215,22 +221,76 @@ public static Environment createProcessEnvironment( } ImmutableList.Builder filesToStage = ImmutableList.builder(); +ImmutableMap.Builder> +lazyArtifactsBuilder = ImmutableMap.builder(); for (String path : pathsToStage) { File file = new File(path); - if (new File(path).exists()) { -// Spurious items get added to the classpath. Filter by just those that exist. -if (file.isDirectory()) { - // Zip up directories so we can upload them to the artifact service. - try { -filesToStage.add(createArtifactInformation(zipDirectory(file))); - } catch (IOException e) { -throw new RuntimeException(e); - } -} else { - filesToStage.add(createArtifactInformation(file)); -} + // Spurious items get added to the classpath. Filter by just those that exist. + if (file.exists()) { +String id = UUID.randomUUID().toString(); Review comment: Yes, I think the goal is to avoid even this enumeration until we actually need it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414922) Time Spent: 2h 40m (was: 2.5h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414923&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414923 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 17:50 Start Date: 02/Apr/20 17:50 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402499335 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Optional; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public interface ArtifactResolver { + void register(ResolutionFn fn); + + RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline); + + interface ResolutionFn { +Optional resolve(RunnerApi.ArtifactInformation info); Review comment: As with the resolution API, one may want to attempt to resolve multiple artifacts (e.g. maven dependencies) simultaneously. One may also need to return multiple artifacts as the resolution of a single artifact (e.g. the deferred "ambient environment" one). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414923) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414921&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414921 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 17:50 Start Date: 02/Apr/20 17:50 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402500648 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public class DefaultArtifactResolver implements ArtifactResolver { + public static ArtifactResolver INSTANCE = new DefaultArtifactResolver(); + + private ResolutionFn resolver = + (info) -> { +if (BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) { + return Optional.of(info); +} else { + return Optional.empty(); +} + }; + + @Override + public void register(ResolutionFn fn) { +resolver = Review comment: I wonder if having an explicit List would be easier to understand than the implicit chaining in these abstract classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414921) Time Spent: 2h 40m (was: 2.5h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414924&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414924 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 17:50 Start Date: 02/Apr/20 17:50 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402494768 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Optional; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public interface ArtifactResolver { + void register(ResolutionFn fn); Review comment: I agree. However, this might be more limited. E.g. one could have a more than one resolver per type, each of which can only resolve a subset, or alternatively one could want to resolve more than one type (or any type), e.g. a proxying resolver. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414924) Time Spent: 2h 50m (was: 2h 40m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414920&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414920 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 17:50 Start Date: 02/Apr/20 17:50 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402492044 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1206,6 +1210,11 @@ message MavenPayload { string repository_url = 2; } +message DeferredArtifactPayload { + // A id for deferred artifacts. + string id = 1; Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414920) Time Spent: 2.5h (was: 2h 20m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414896&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414896 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 17:27 Start Date: 02/Apr/20 17:27 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402479848 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public class DefaultArtifactResolver implements ArtifactResolver { Review comment: Class comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414896) Time Spent: 2h 20m (was: 2h 10m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414898&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414898 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 17:27 Start Date: 02/Apr/20 17:27 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402482365 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DefaultArtifactResolver.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public class DefaultArtifactResolver implements ArtifactResolver { + public static ArtifactResolver INSTANCE = new DefaultArtifactResolver(); + + private ResolutionFn resolver = + (info) -> { +if (BeamUrns.getUrn(RunnerApi.StandardArtifacts.Types.FILE).equals(info.getTypeUrn())) { + return Optional.of(info); +} else { + return Optional.empty(); +} + }; + + @Override + public void register(ResolutionFn fn) { +resolver = +(info) -> { + Optional resolved = fn.resolve(info); + if (resolved.isPresent()) { +return resolved; + } else { +return resolver.resolve(info); + } +}; + } + + @Override + public RunnerApi.Pipeline resolveArtifacts(RunnerApi.Pipeline pipeline) { Review comment: The level of nesting in this method is getting a little silly. Use local variables to logically describe what your doing and consider dropping using stream This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414898) Time Spent: 2h 20m (was: 2h 10m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414897 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 17:27 Start Date: 02/Apr/20 17:27 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402482918 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java ## @@ -215,22 +221,76 @@ public static Environment createProcessEnvironment( } ImmutableList.Builder filesToStage = ImmutableList.builder(); +ImmutableMap.Builder> +lazyArtifactsBuilder = ImmutableMap.builder(); for (String path : pathsToStage) { File file = new File(path); - if (new File(path).exists()) { -// Spurious items get added to the classpath. Filter by just those that exist. -if (file.isDirectory()) { - // Zip up directories so we can upload them to the artifact service. - try { -filesToStage.add(createArtifactInformation(zipDirectory(file))); - } catch (IOException e) { -throw new RuntimeException(e); - } -} else { - filesToStage.add(createArtifactInformation(file)); -} + // Spurious items get added to the classpath. Filter by just those that exist. + if (file.exists()) { +String id = UUID.randomUUID().toString(); Review comment: Note that you could put one ID into the map for the entire list of files if you allowed ResolutionFn to return a List/Collection of artifacts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414897) Time Spent: 2h 20m (was: 2h 10m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414895&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414895 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 17:27 Start Date: 02/Apr/20 17:27 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402477389 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Optional; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public interface ArtifactResolver { Review comment: Please add comments to this class and methods. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414895) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414894 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 17:27 Start Date: 02/Apr/20 17:27 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402476502 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ArtifactResolver.java ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import java.util.Optional; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +public interface ArtifactResolver { + void register(ResolutionFn fn); Review comment: We won't need to rely on using Optional if we make registration take a URN and ResolutionFn and then the resolver can be found by URN. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414894) Time Spent: 2h 10m (was: 2h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414893 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 02/Apr/20 17:27 Start Date: 02/Apr/20 17:27 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r402474608 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1206,6 +1210,11 @@ message MavenPayload { string repository_url = 2; } +message DeferredArtifactPayload { + // A id for deferred artifacts. + string id = 1; Review comment: I was under the impression we were going to make this a `bytes` field so that any deferred information can get passed through and then back to the **creator** whether it be an id or a serialized blob of objects or ... Allowing for `bytes` enables for solutions beyond in memory maps. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414893) Time Spent: 2h 10m (was: 2h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414370&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414370 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 01/Apr/20 23:19 Start Date: 01/Apr/20 23:19 Worklog Time Spent: 10m Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-607536354 CC: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414370) Time Spent: 2h (was: 1h 50m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 2h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414368 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 01/Apr/20 23:13 Start Date: 01/Apr/20 23:13 Worklog Time Spent: 10m Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-607534661 CC: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414368) Time Spent: 1h 50m (was: 1h 40m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=414307&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414307 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 01/Apr/20 21:49 Start Date: 01/Apr/20 21:49 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r401930784 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1146,33 +1146,41 @@ message StandardArtifacts { enum Types { // A URN for locally-accessible artifact files. // payload: ArtifactFilePayload -FILE = 0 [(beam_urn) = "beam:artifact:type:file:v1"]; +FILE = 0 [(beam_urn) = "beam:artifact:type:file:v1"]; // A URN for artifacts described by URLs. // payload: ArtifactUrlPayload -URL = 1 [(beam_urn) = "beam:artifact:type:url:v1"]; +URL = 1 [(beam_urn) = "beam:artifact:type:url:v1"]; // A URN for artifacts embedded in ArtifactInformation proto. // payload: EmbeddedFilePayload. -EMBEDDED = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"]; +EMBEDDED = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"]; // A URN for Python artifacts hosted on PYPI. // payload: PypiPayload -PYPI = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"]; +PYPI = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"]; // A URN for Java artifacts hosted on a Maven repository. // payload: MavenPayload -MAVEN= 4 [(beam_urn) = "beam:artifact:type:maven:v1"]; +MAVEN = 4 [(beam_urn) = "beam:artifact:type:maven:v1"]; + +// A URN for locally-accessible artifact directory. +// payload: ArtifactDirectoryPayload +DIRECTORY = 5 [(beam_urn) = "beam:artifact:type:directory:v1"]; } enum Roles { // A URN for staging-to role. // payload: ArtifactStagingToRolePayload STAGING_TO = 0 [(beam_urn) = "beam:artifact:role:staging_to:v1"]; + +// A URN for unzip-to role. +// payload: ArtifactUnzipToRolePayload +UNZIP_TO= 1 [(beam_urn) = "beam:artifact:role:unzip_to:v1"]; } } message ArtifactFilePayload { - // a string for an artifact path e.g. "/tmp/foo.jar" + // a string for an artifact file path e.g. "/tmp/foo.jar" string path = 1; // The hex-encoded sha256 checksum of the artifact. Review comment: adding deferred artifacts and populating sha256. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 414307) Time Spent: 1h 40m (was: 1.5h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=412540&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-412540 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 30/Mar/20 20:02 Start Date: 30/Mar/20 20:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#discussion_r400458940 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1146,33 +1146,41 @@ message StandardArtifacts { enum Types { // A URN for locally-accessible artifact files. // payload: ArtifactFilePayload -FILE = 0 [(beam_urn) = "beam:artifact:type:file:v1"]; +FILE = 0 [(beam_urn) = "beam:artifact:type:file:v1"]; // A URN for artifacts described by URLs. // payload: ArtifactUrlPayload -URL = 1 [(beam_urn) = "beam:artifact:type:url:v1"]; +URL = 1 [(beam_urn) = "beam:artifact:type:url:v1"]; // A URN for artifacts embedded in ArtifactInformation proto. // payload: EmbeddedFilePayload. -EMBEDDED = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"]; +EMBEDDED = 2 [(beam_urn) = "beam:artifact:type:embedded:v1"]; // A URN for Python artifacts hosted on PYPI. // payload: PypiPayload -PYPI = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"]; +PYPI = 3 [(beam_urn) = "beam:artifact:type:pypi:v1"]; // A URN for Java artifacts hosted on a Maven repository. // payload: MavenPayload -MAVEN= 4 [(beam_urn) = "beam:artifact:type:maven:v1"]; +MAVEN = 4 [(beam_urn) = "beam:artifact:type:maven:v1"]; + +// A URN for locally-accessible artifact directory. +// payload: ArtifactDirectoryPayload +DIRECTORY = 5 [(beam_urn) = "beam:artifact:type:directory:v1"]; } enum Roles { // A URN for staging-to role. // payload: ArtifactStagingToRolePayload STAGING_TO = 0 [(beam_urn) = "beam:artifact:role:staging_to:v1"]; + +// A URN for unzip-to role. +// payload: ArtifactUnzipToRolePayload +UNZIP_TO= 1 [(beam_urn) = "beam:artifact:role:unzip_to:v1"]; } } message ArtifactFilePayload { - // a string for an artifact path e.g. "/tmp/foo.jar" + // a string for an artifact file path e.g. "/tmp/foo.jar" string path = 1; // The hex-encoded sha256 checksum of the artifact. Review comment: It doesn't seem like we are populating sha256 here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 412540) Time Spent: 1.5h (was: 1h 20m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=410566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410566 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 26/Mar/20 19:59 Start Date: 26/Mar/20 19:59 Worklog Time Spent: 10m Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-604654049 R: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410566) Time Spent: 1h 20m (was: 1h 10m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=409734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-409734 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 25/Mar/20 19:04 Start Date: 25/Mar/20 19:04 Worklog Time Spent: 10m Work Description: ihji commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-604028675 directory type and unzip_to role added. didn't create a interface for resolving artifacts since we only have one (resolving directory) now. maybe refactor later for supporting more resolvers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 409734) Time Spent: 1h 10m (was: 1h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408959&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408959 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 24/Mar/20 17:34 Start Date: 24/Mar/20 17:34 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-603394246 In this case, these are directories to be added to the classpath, so it may be preferable to keep them zipped, though such a role does make sense. (This is also an example of a new "directory" type that could be "resolved" to a zipfile.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 408959) Time Spent: 1h (was: 50m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 1h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408842&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408842 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 24/Mar/20 15:14 Start Date: 24/Mar/20 15:14 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-603297904 Should we add a `directory` option and role that would be to `unzip`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 408842) Time Spent: 50m (was: 40m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 50m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408841&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408841 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 24/Mar/20 15:13 Start Date: 24/Mar/20 15:13 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-603297142 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 408841) Time Spent: 40m (was: 0.5h) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Labels: portability > Fix For: 2.21.0 > > Time Spent: 40m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408660&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408660 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 24/Mar/20 10:46 Start Date: 24/Mar/20 10:46 Worklog Time Spent: 10m Work Description: iemejia commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-603165704 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 408660) Time Spent: 0.5h (was: 20m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408659&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408659 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 24/Mar/20 10:45 Start Date: 24/Mar/20 10:45 Worklog Time Spent: 10m Work Description: iemejia commented on issue #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205#issuecomment-603165704 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 408659) Time Spent: 20m (was: 10m) > Enumerating artifacts is too expensive in Java > -- > > Key: BEAM-9578 > URL: https://issues.apache.org/jira/browse/BEAM-9578 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Heejong Lee >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 20m > Remaining Estimate: 0h > > There are a lot of places (e.g. *ParDoTranslation#getParDoPayload*) which > effectively call *Environments#createOrGetDefaultEnvironment* which causes > [artifacts to be > computed|https://github.com/apache/beam/blob/fc6cef9972780ca6b7525d4aadd65a8344221f1b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L114]. > This leads to zipping directories for non-jar dependencies. > Similar problems may exist for Python/Go. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9578) Enumerating artifacts is too expensive in Java
[ https://issues.apache.org/jira/browse/BEAM-9578?focusedWorklogId=408619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-408619 ] ASF GitHub Bot logged work on BEAM-9578: Author: ASF GitHub Bot Created on: 24/Mar/20 09:36 Start Date: 24/Mar/20 09:36 Worklog Time Spent: 10m Work Description: ihji commented on pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [