gtristan commented on code in PR #1991:
URL: https://github.com/apache/buildstream/pull/1991#discussion_r2158310157


##########
src/buildstream/buildelement.py:
##########
@@ -285,10 +297,19 @@ def configure_sandbox(self, sandbox):
             command_dir = build_root
         sandbox.set_work_directory(command_dir)
 
+    def stage(self, sandbox):
         # Setup environment
-        sandbox.set_environment(self.get_environment())
+        env = self.get_environment()
 
-    def stage(self, sandbox):
+        for digest_variable, element_list in self.__digest_environment.items():
+            dummy_sandbox = SandboxDummy(
+                self._get_context(), self._get_project(), plugin=self, 
stdout=None, stderr=None, config={}
+            )
+            self.stage_dependency_artifacts(dummy_sandbox, element_list)

Review Comment:
   @abderrahim writes:
   
   > >   In the case that we are staging elements in element_list which don't 
depend on eachother, I don't believe that Element.stage_dependency_artifacts() 
will do any sorting.
   >
   > I find that statement weird given that stage_dependency_artifact() is 
advertised to do just that.
   
   Right, this text could be clarified further, this function says:
   
   > *"This is primarily a convenience wrapper around 
`Element.stage_artifact()` which takes care of staging all the dependencies in 
staging order and issueing the appropriate warnings."*
   
   Which is not false, it does stage *all of the specified element* and it does 
so *in staging order*, however if you look at what the code does, it does not 
do any toplevel sorting of the input element sequence.
   
   
   > I would also argue that if this is true, then it is orthogonal to the 
changes in this pull request. There is no difference between:
   >
   > ```yaml
   > depends:
   > - a.bst
   > - b.bst
   > - c.bst
   >   config:
   >     location: /sysroot
   > ```
   > and
   >
   > ```yaml
   > depends:
   > - a.bst
   > - b.bst
   > - c.bst
   >   config:
   >     digest-environment: DEPENDENCIES_CAS_DIGEST
   > ```
   > in the current code, except for the fact that the former stages in a 
subdirectory of the build sandbox and the latter stages in a dummy sandbox.
   
   I think you are missing my meaning with your example.
   
   So here is my hypothesis; given your `a`, `b`, `c` example, let us assume 
that the toplevel element `e` in question depends on `b` and `c` which may also 
depend on `a`, where `b` and `c` install the same file.
   
   e.g.:
   
   ```
               e
             /   \
            b     c  <--- both "b" and "c" install /etc/abc.conf
             \   / 
               a
   ```
   
   What I am saying, is that with this implementation, the value of 
`DEPENDENCIES_CAS_DIGEST` may differ in the following two configurations.
   
   ### Configuration 1
   
   Here we stage element `e` deterministically into the sandbox, and stage 
`c`'s version of `/etc/abc.conf` over `b`'s version of `/etc/abc.conf` in an 
orthogonal, extra stage operation which we use to determine the cas digest 
`DEPENDENCIES_CAS_DIGEST`.
   
   ```yaml
   depends:
   - a.bst
   - b.bst
     config:
       digest-environment: DEPENDENCIES_CAS_DIGEST
   - c.bst
     config:
       digest-environment: DEPENDENCIES_CAS_DIGEST
   ```
   
   ### Configuration 2
   
   Here we do the same as configuration 1, except that `b`'s version of 
`/etc/abc.conf` is staged on top of `c`'s version of the same file, resulting 
in a *different digest*.
   
   ```yaml
   depends:
   - a.bst
   - c.bst
     config:
       digest-environment: DEPENDENCIES_CAS_DIGEST
   - b.bst
     config:
       digest-environment: DEPENDENCIES_CAS_DIGEST
   ```
   
   ### A closer look
   
   The body of `Element.stage_dependency_artifacts()` is doing the following:
   
   ```python
   with self._overlap_collector.session(action, path):
       for dep in self.dependencies(selection):
           dep._stage_artifact(sandbox, path=path, include=include, 
exclude=exclude, orphans=orphans, owner=self)
   ```
   
   And, looking at `Element.dependencies()`, we can quickly see that when the 
`selection` argument is given to select *"Subsets of the dependency graph"*, 
the `self` element is not considered at all in the algorithm.
   
   This means that the overall deterministic staging order *of element `e`* is 
not considered when looping over the dependencies of the `selection` argument 
given to `Element.stage_dependency_artifacts()`.
   
   Arguably this algorithm in `Element.stage_dependency_artifacts()` needs to 
be made more deterministic in this regard, although I wouldn't necessarily say 
that the docs are lying here - they are yielded *in staging order* relative to 
*eachother*, but they are not yielded in *the staging order of the given 
element*.
   
   And so I come back to your original argument:
   
   > I would also argue that if this is true, then it is orthogonal to the 
changes in this pull request.
   
   I have a point of contention here.
   
   I have not yet looked at whether the `location` dependency configuration 
suffers from this issue as well, however, if we have this issue with the 
`location` dependency configuration, it needs to be fixed, and I also do not 
want to go as far as landing *this change* in the knowledge that this is indeed 
an issue.
   
   ### Prove me wrong :)
   
   There may be other ways that I am wrong about this bug, but I can think of 
one possibility:
   
   It is possible that in the `Element` loading/instantiation process, that the 
dependency configurations are exposed to the plugin *in staging order* rather 
than in the order in which they are listed in the element declaration (`.bst` 
file)... if this is the case, `Element.dependencies()` may be sufficient for 
both of these use cases.
   
   It would however be good to add a test with this merge request, asserting 
that we get the same CAS digest environment variable using both orderings of 
element `e` in my overlapping example above.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to