gtristan commented on PR #2035:
URL: https://github.com/apache/buildstream/pull/2035#issuecomment-3112839840
> Here is an example of the current state of the inspect output against a
test project.
>
> Any insight into specifically what types of fields we would like to
include in this output as well as general feedback would be appreciated.
>
Here are my thoughts for today...
## Connecting the dots
Something I that we'll absolutely need is a way for scripts to piece together
the data in interesting ways, this means we need reliable ways to make
references
to other data in the JSON output.
For instance, when printing a dependency - we need a reliable way to uniquely
address that dependency in the accompanying data.
### Element references (dependencies)
How to uniquely address an element dependency in the output may depend on
other
things.
Currently with `bst show` output, we print the full element paths (including
the junction prefixes) which would allow one to make the connection.
In your comment you are proposing:
```yaml
"dependencies": [
"import-local-files.bst",
"import-remote-files.bst"
],
"build_dependencies": [
"import-local-files.bst",
"import-remote-files.bst"
],
"runtime_dependencies": [
"import-local-files.bst",
"import-remote-files.bst"
],
```
I think it would be better to have one `"dependencies"` list following the
[full dependency
dictionary](https://docs.buildstream.build/2.5/format_declaring.html#expressing-dependencies)
E.g.:
```
"dependencies" : [
{
"name": "foo.bst",
"junction": "junction.bst"
"type" : "dependency type"
},
]
```
Where `"type"` can be any of the [valid dependency
types](https://docs.buildstream.build/2.5/format_declaring.html#format-dependencies-types).
This would mean that for a dependency in the toplevel project, the
"junction" entry
would be empty, if the element is in a sub-subproject, the junction would be
a junction
path, such as `"subproject.bst:subsubproject.bst"`.
### Project junction
In order to piece together the data, we should have a special `"junction"`
parameter
which is serialized as a regular element, serializing data from the junction
element
directly.
In this way, the junction's `"name"` will be a junction element present in
the parent
project which the project was loaded from.
Also, this is very helpful as it adds the ability to provide a
[SourceInfo](https://docs.buildstream.build/2.5/buildstream.source.html#buildstream.source.SourceInfo)
list for the junction itself.
### Project elements
I think that the elements which were loaded from a given project, should be
found
in a nested `"elements"` list defined within the `"project"`.
E.g.:
```yaml
{
"projects" : [
{
"name": "freedesktop-sdk",
"junction" : {
"freedesktop-sdk.bst",
"source-info" : [
...
]
},
# Serialize the elements from this project, which are loaded in this
pipeline
"elements" : [
...
]
}
],
...
}
```
## Lets start smaller and build on that
There is a bunch of stuff here which we don't know why it might be useful,
lets
reduce this dramatically and iteravely add aspects to the serialized data
with
careful thought, depending on what it might be useful for.
### User config
I think we should just drop the `"user_config"` stuff, it is unclear whether
any of this will be useful.
Also, it doesn't seem to make sense regarding the BuildStream data
model to dump data about "project config" and "user config" separately.
The finalized data model is an amalgamation of defaults, user configuration,
project configuration and element specific information, and I am skeptical
about serializing/framing the loaded data model in terms of the input
which BuildStream uses to load the data model.
In any case, I suggest we just drop all of this for now.
### Project level things
* `"duplicates"` and `"declarations"`
I don't think we need `"duplicates"` or `"declarations"` here, it is not
clear
what you propose `"declarations"` to actually mean, either.
About duplicates: Given my previous comment about *Connecting the dots*
above, I do not
think it is relevant to serialize how a project configures this; instead
we focus on how
the data model was constructed.
In the case that the same project was loaded twice in the pipeline, they
will be
distinguished by their `"junction"`, and any dependencies on an element
that is loaded
twice in the pipeline, will be able to distinguish *which* `"foo.bst"` was
being
referred to, by way of providing strong references to elements, as
described
in my previous comment about *Element references*.
* `"element_overrides"` and `"source_overrides"`
I don't know what you intended here, but in either of my interpretations,
lets
drop these.
If we are talking about [overriding
elements](https://docs.buildstream.build/2.5/elements/junction.html#overriding-elements)
in subprojects via junctions, then lets ignore this detail.
The resolved data model will show which element was used from which
project, even in the
case that a subproject ends up depending on an element in it's parent
project, which
depends on elements in the parent and subproject.
I think instead you are referring to
[overriding plugin
configuration](https://docs.buildstream.build/2.5/format_project.html#overriding-plugin-defaults)
There is no reason to include this in the serialized json, as it is not a
part of the
resolved data model. These dictionaries are a part of the
[element
composition](https://docs.buildstream.build/2.5/format_intro.html#format-composition),
and we are only concerned with providing the end resulting `variables`,
`environment` etc in
the reported elements.
* `"config"` dictionary
I don't think we need any nesting here, I think we can assume that the
toplevel
`"project"` dictionary is *stuff about the project*.
* `"directory"`
I don't think we need this, because it is only contextual to where the
project itself
is checked out and built - I don't see how this is useful for a scripting
interface.
* `"aliases"`
Again this is input data to the data model, but not relevant I think.
Ultimately relevant URL results will be reported in the source URLs.
Later we may consider doing something fancy to report possible mirror URLs
for
a given laoded project/user config - allowing us to print which URLs will
be
traversed in order for a given source - but nobody is asking for this now
and we don't need it in an initial revision.
## Plugin serialization
We need to consider some different useful things for each plugin origin.
We have local plugins, pip loaded plugins, and plugins loaded through
junctions.
I think it will make sense to have separate lists here within a `"project"`
dictionary, specifying more precisely where a plugin came from.
Similarly to my suggestion about the project junction, we should also
serialize the junction from whence the plugins from that origin were
loaded in the same way (so we can have the SourceInfo describing where
plugins came from).
While I feel strongly that fully descriptive data needs to be provided
about loaded plugins, I do not think that serializing the project's plugins
in the `bst inspect` output is a hard blocker for an initial version.
## What we need and don't have
* We definitely want the cache keys here.
For each element, at least the equivalent of "%{full-key}" from
`bst show`, or empty if the element or it's dependencies are in
an *inconsistent state* (inconsistent state means that the element
or it's dependencies are missing a resolved, specific source reference).
* Element description
This is easy, and can be useful for generating nice reports
* Artifact cas digest
I'm on the fence about whether we need to have this in an initial
revision, not having it is a step down from `bst show`.
If we add this, we should have it in a nested `"artifact"` block
found on the element dictionary which is enumerated on a per
project basis.
E.g.:
```yaml
{
"projects" : [
...
"elements" : [
{
"artifact" : {
"cas-digest" : "<digest value>"
}
}
]
]
}
This will allow extensibility for other things from the artifact
which we'll want to add later, like files metadata, build logs, etc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]