Re: [PR] Implement bst inspect subcommand [buildstream]

via GitHub Thu, 24 Jul 2025 03:00:08 -0700


gtristan commented on PR #2035:
URL: https://github.com/apache/buildstream/pull/2035#issuecomment-3112839840


   > Here is an example of the current state of the inspect output against a 
test project.
   > 
   > Any insight into specifically what types of fields we would like to 
include in this output as well as general feedback would be appreciated.
   > 
   
   Here are my thoughts for today...
   
   ## Connecting the dots
   
   Something I that we'll absolutely need is a way for scripts to piece together
   the data in interesting ways, this means we need reliable ways to make 
references
   to other data in the JSON output.
   
   For instance, when printing a dependency - we need a reliable way to uniquely
   address that dependency in the accompanying data.
   
   
   ### Element references (dependencies)
   
   How to uniquely address an element dependency in the output may depend on 
other
   things.
   
   Currently with `bst show` output, we print the full element paths (including
   the junction prefixes) which would allow one to make the connection.
   
   In your comment you are proposing:
   
   ```yaml
         "dependencies": [
           "import-local-files.bst",
           "import-remote-files.bst"
         ],
         "build_dependencies": [
           "import-local-files.bst",
           "import-remote-files.bst"
         ],
         "runtime_dependencies": [
           "import-local-files.bst",
           "import-remote-files.bst"
         ],
   ```
   
   I think it would be better to have one `"dependencies"` list following the
   [full dependency 
dictionary](https://docs.buildstream.build/2.5/format_declaring.html#expressing-dependencies)
   
   E.g.:
   
   ```
         "dependencies" : [
           {
             "name": "foo.bst",
             "junction": "junction.bst"
             "type" : "dependency type"
           },
         ]
   ```
   
   Where `"type"` can be any of the [valid dependency 
types](https://docs.buildstream.build/2.5/format_declaring.html#format-dependencies-types).
   
   This would mean that for a dependency in the toplevel project, the 
"junction" entry
   would be empty, if the element is in a sub-subproject, the junction would be 
a junction
   path, such as `"subproject.bst:subsubproject.bst"`.
   
   
   ### Project junction
   
   In order to piece together the data, we should have a special `"junction"` 
parameter
   which is serialized as a regular element, serializing data from the junction 
element
   directly.
   
   In this way, the junction's `"name"` will be a junction element present in 
the parent
   project which the project was loaded from.
   
   Also, this is very helpful as it adds the ability to provide a
   
[SourceInfo](https://docs.buildstream.build/2.5/buildstream.source.html#buildstream.source.SourceInfo)
   list for the junction itself.
   
   
   ### Project elements
   
   I think that the elements which were loaded from a given project, should be 
found
   in a nested `"elements"` list defined within the `"project"`.
   
   E.g.:
   
   ```yaml
   {
     "projects" : [
       {
         "name": "freedesktop-sdk",
         "junction" : {
           "freedesktop-sdk.bst",
           "source-info" : [
              ...
           ]
         },
   
         # Serialize the elements from this project, which are loaded in this 
pipeline
         "elements" : [
            ...
         ]
       }
     ],
     ...
   }
   ```
   
   ## Lets start smaller and build on that
   
   There is a bunch of stuff here which we don't know why it might be useful, 
lets
   reduce this dramatically and iteravely add aspects to the serialized data 
with
   careful thought, depending on what it might be useful for.
   
   
   ### User config
   
   I think we should just drop the `"user_config"` stuff, it is unclear whether
   any of this will be useful.
   
   Also, it doesn't seem to make sense regarding the BuildStream data
   model to dump data about "project config" and "user config" separately.
   
   The finalized data model is an amalgamation of defaults, user configuration,
   project configuration and element specific information, and I am skeptical
   about serializing/framing the loaded data model in terms of the input
   which BuildStream uses to load the data model.
   
   In any case, I suggest we just drop all of this for now.
   
   
   ### Project level things
   
   
   * `"duplicates"` and `"declarations"`
   
     I don't think we need `"duplicates"` or `"declarations"` here, it is not 
clear
     what you propose `"declarations"` to actually mean, either.
   
     About duplicates: Given my previous comment about *Connecting the dots* 
above, I do not
     think it is relevant to serialize how a project configures this; instead 
we focus on how
     the data model was constructed.
   
     In the case that the same project was loaded twice in the pipeline, they 
will be
     distinguished by their `"junction"`, and any dependencies on an element 
that is loaded
     twice in the pipeline, will be able to distinguish *which* `"foo.bst"` was 
being
     referred to, by way of providing strong references to elements, as 
described
     in my previous comment about *Element references*.
   
   * `"element_overrides"` and `"source_overrides"`
   
     I don't know what you intended here, but in either of my interpretations, 
lets
     drop these.
   
     If we are talking about [overriding 
elements](https://docs.buildstream.build/2.5/elements/junction.html#overriding-elements)
     in subprojects via junctions, then lets ignore this detail.
   
     The resolved data model will show which element was used from which 
project, even in the
     case that a subproject ends up depending on an element in it's parent 
project, which
     depends on elements in the parent and subproject.
   
     I think instead you are referring to
     [overriding plugin 
configuration](https://docs.buildstream.build/2.5/format_project.html#overriding-plugin-defaults)
   
     There is no reason to include this in the serialized json, as it is not a 
part of the
     resolved data model. These dictionaries are a part of the
     [element 
composition](https://docs.buildstream.build/2.5/format_intro.html#format-composition),
     and we are only concerned with providing the end resulting `variables`, 
`environment` etc in
     the reported elements.
   
   * `"config"` dictionary
   
     I don't think we need any nesting here, I think we can assume that the 
toplevel
     `"project"` dictionary is *stuff about the project*.
   
   
   * `"directory"`
   
     I don't think we need this, because it is only contextual to where the 
project itself
     is checked out and built - I don't see how this is useful for a scripting 
interface.
   
   * `"aliases"`
   
     Again this is input data to the data model, but not relevant I think.
   
     Ultimately relevant URL results will be reported in the source URLs.
   
     Later we may consider doing something fancy to report possible mirror URLs 
for
     a given laoded project/user config - allowing us to print which URLs will 
be
     traversed in order for a given source - but nobody is asking for this now
     and we don't need it in an initial revision.
   
   ## Plugin serialization
   
   We need to consider some different useful things for each plugin origin.
   
   We have local plugins, pip loaded plugins, and plugins loaded through 
junctions.
   
   I think it will make sense to have separate lists here within a `"project"`
   dictionary, specifying more precisely where a plugin came from.
   
   Similarly to my suggestion about the project junction, we should also
   serialize the junction from whence the plugins from that origin were
   loaded in the same way (so we can have the SourceInfo describing where
   plugins came from).
   
   While I feel strongly that fully descriptive data needs to be provided
   about loaded plugins, I do not think that serializing the project's plugins
   in the `bst inspect` output is a hard blocker for an initial version.
   
   
   ## What we need and don't have
   
   * We definitely want the cache keys here.
   
     For each element, at least the equivalent of "%{full-key}" from
     `bst show`, or empty if the element or it's dependencies are in
     an *inconsistent state* (inconsistent state means that the element
     or it's dependencies are missing a resolved, specific source reference).
   
   * Element description
   
     This is easy, and can be useful for generating nice reports
   
   * Artifact cas digest
   
     I'm on the fence about whether we need to have this in an initial
     revision, not having it is a step down from `bst show`.
   
     If we add this, we should have it in a nested `"artifact"` block
     found on the element dictionary which is enumerated on a per
     project basis.
   
     E.g.:
   
     ```yaml
     {
       "projects" : [
         ...
         "elements" : [
           {
             "artifact" : {
               "cas-digest" : "<digest value>"
             }
           }
         ]
       ]
     }
   
     This will allow extensibility for other things from the artifact
     which we'll want to add later, like files metadata, build logs, etc.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Implement bst inspect subcommand [buildstream]

Reply via email to