[ https://issues.apache.org/jira/browse/BEAM-4291?focusedWorklogId=113390&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-113390 ]
ASF GitHub Bot logged work on BEAM-4291: ---------------------------------------- Author: ASF GitHub Bot Created on: 19/Jun/18 21:32 Start Date: 19/Jun/18 21:32 Worklog Time Spent: 10m Work Description: jkff commented on a change in pull request #5676: [BEAM-4291] Propagates artifact retrieval token in Flink runner and to the Java harness URL: https://github.com/apache/beam/pull/5676#discussion_r196585622 ########## File path: model/fn-execution/src/main/proto/beam_provision_api.proto ########## @@ -67,6 +67,10 @@ message ProvisionInfo { // (optional) Resource limits that the SDK harness worker should respect. // Runners may -- but are not required to -- enforce any limits provided. Resources resource_limits = 4; + + // (required) The artifact retrieval token produced by + // ArtifactStagingService.CommitManifestResponse. + string retrieval_token = 6; Review comment: This design was discussed some time ago (see also https://github.com/apache/beam/pull/5582) - we're coming from the assumption that all services are (or at least may be) globally distributed and stateless, i.e. we're not relying on the assumption that there's 1 ArtifactRetrievalService per worker or per harness. Without that assumption, we need the ArtifactRetrievalService calls to be somehow linked to which job we're talking about. Likewise, ArtifactStagingService also needs to know which job we're talking about. We decided to do this by propagating tokens: - PrepareJob returns a token used for ArtifactStagingService calls - ArtifactStagingService.CommitManifest returns a token used for ArtifactRetrievalService calls This token is an opaque string containing the information necessary for the service to do its job. In practice, with the "distributed file system" based implementations of both services, we're using (basically) a base path as the token. Alternatively we could explicitly include the job ID in the RPCs, but that would require the services to do some sort of global lookup of artifact placement parameters based on job ID, it seems easier to include the necessary parameters explicitly in the token. Now, since a retrieval token is needed for the harness to talk to the retrieval service, it seems reasonable to include it in provision info. It's not part of the service descriptor because it does not identify the service, it only gives a necessary argument for its RPCs. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 113390) Time Spent: 6h 50m (was: 6h 40m) > ArtifactRetrievalService that retrieves artifacts from a distributed > filesystem > ------------------------------------------------------------------------------- > > Key: BEAM-4291 > URL: https://issues.apache.org/jira/browse/BEAM-4291 > Project: Beam > Issue Type: Sub-task > Components: runner-core > Reporter: Eugene Kirpichov > Assignee: Axel Magnuson > Priority: Major > Fix For: 2.6.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > In agreement with how they are staged in BEAM-4290. -- This message was sent by Atlassian JIRA (v7.6.3#76005)