[ 
https://issues.apache.org/jira/browse/BEAM-4291?focusedWorklogId=113390&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-113390
 ]

ASF GitHub Bot logged work on BEAM-4291:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Jun/18 21:32
            Start Date: 19/Jun/18 21:32
    Worklog Time Spent: 10m 
      Work Description: jkff commented on a change in pull request #5676: 
[BEAM-4291] Propagates artifact retrieval token in Flink runner and to the Java 
harness
URL: https://github.com/apache/beam/pull/5676#discussion_r196585622
 
 

 ##########
 File path: model/fn-execution/src/main/proto/beam_provision_api.proto
 ##########
 @@ -67,6 +67,10 @@ message ProvisionInfo {
     // (optional) Resource limits that the SDK harness worker should respect.
     // Runners may -- but are not required to -- enforce any limits provided.
     Resources resource_limits = 4;
+
+    // (required) The artifact retrieval token produced by
+    // ArtifactStagingService.CommitManifestResponse.
+    string retrieval_token = 6;
 
 Review comment:
   This design was discussed some time ago (see also 
https://github.com/apache/beam/pull/5582) - we're coming from the assumption 
that all services are (or at least may be) globally distributed and stateless, 
i.e. we're not relying on the assumption that there's 1 
ArtifactRetrievalService per worker or per harness. Without that assumption, we 
need the ArtifactRetrievalService calls to be somehow linked to which job we're 
talking about. Likewise, ArtifactStagingService also needs to know which job 
we're talking about.
   
   We decided to do this by propagating tokens:
   
   - PrepareJob returns a token used for ArtifactStagingService calls
   - ArtifactStagingService.CommitManifest returns a token used for 
ArtifactRetrievalService calls
   
   This token is an opaque string containing the information necessary for the 
service to do its job. In practice, with the "distributed file system" based 
implementations of both services, we're using (basically) a base path as the 
token.
   
   Alternatively we could explicitly include the job ID in the RPCs, but that 
would require the services to do some sort of global lookup of artifact 
placement parameters based on job ID, it seems easier to include the necessary 
parameters explicitly in the token.
   
   Now, since a retrieval token is needed for the harness to talk to the 
retrieval service, it seems reasonable to include it in provision info. It's 
not part of the service descriptor because it does not identify the service, it 
only gives a necessary argument for its RPCs.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 113390)
    Time Spent: 6h 50m  (was: 6h 40m)

> ArtifactRetrievalService that retrieves artifacts from a distributed 
> filesystem
> -------------------------------------------------------------------------------
>
>                 Key: BEAM-4291
>                 URL: https://issues.apache.org/jira/browse/BEAM-4291
>             Project: Beam
>          Issue Type: Sub-task
>          Components: runner-core
>            Reporter: Eugene Kirpichov
>            Assignee: Axel Magnuson
>            Priority: Major
>             Fix For: 2.6.0
>
>          Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> In agreement with how they are staged in BEAM-4290.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to