Hello again,

Up until now, tools have been developed to extend BuildStream capabilities or 
to integrate BuildStream with complementary systems without appropriate APIs. 
To mention a few examples:

* bst-graph to better understand elements dependencies [1].
* bst-to-lorry to facilitate mirroring sources via lorry [2].
* buildstream-license-checker to identify the licenses [3]. 
* And other tools for generating common reports like SBOMs [4].

Although these tools technically serve their purpose, it is not without some 
risks, drawbacks and limitations due to the lack of appropriate APIs. In the 
case of the tools mentioned above:

* Some rely on BuildStream’s private Python APIs, e.g., to collect the 
dependency elements of a given top-level element, at the cost of possibly being 
broken after each BuildStream update.
* Others are built on top of the bst command-line interface, e.g., calling “bst 
show” and other commands, parsing the output at the cost of being much slower 
and having limited access to details.

Therefore, I am wondering if the project would consider opening up a limited 
subset of existing APIs, allowing third-party tools that are fast and won't 
break.

In order to convey an idea of what this would mean, the following are example 
operations required by these tools:

* Query elements dependencies and introspect their attributes and public data, 
e.g., to build a dependencies graph.
* Query elements sources and introspect their attributes, e.g., to list all 
external sources and generate lorry configuration files. Note that parts of 
this particular topic are currently being discussed in a separate thread [5].
* Check out all elements sources, e.g., to run a license scanner on sources 
files.
* Query elements artifacts, e.g., to generate a SBOM manifest.

One way to enable these kinds of operations could be to promote BuildStream’s 
Stream class [6] to a public status or, even better, provide a public version 
of that class that only exposes a subset of the private class. This public 
class could also be in charge of “proxifying” the objects it exposes, e.g., 
returning ElementProxy objects instead of Element objects, so we can put a 
limit to what gets exposed indirectly. 

See the following sketch:

<sketch>
# The reason we want to expose the Stream class is because we want
# to build tools that are similar to bst commands but focused on
# introspecting BuildStream projects.
#
# Therefore, it would make sense to follow the existing structure
# given that it was designed for that as well, but limited to what
# is strictly needed for introspection.

from buildstream.public import (
        App,
        Stream,
        Project,
        PipelineSelection,
)

# Question is how to provide something similar to the private version of the
# App class, without risking setting the current private App on stone.
#
# We could define a public version of the private App class to achieve it.
#
# Candidate methods and attributes for the public version of the App class 
could be:
# App.create() -> App
# App.initialized() -> None
# App.project: Project , or just App.get_project_default_targets() -> List[str]
# App.stream: Stream
#
# Note Project and Stream here refer to public versions of these classes as 
well.

options = {}
app = App.create(options)

with app.initialized():
    # Similarly, a public version of the Stream class could be defined
    # with the goal of introspecting the BuildStream project and nothing more.
    #
    # Candidate methods for the public version of the Stream class could be:
    # Stream.load_selection() -> List[ElementProxy]
    # Stream.query_cache() -> None
    # Stream.source_checkout() -> None
    # Stream.artifact_show() -> List[Union[ElementProxy, ArtifactElement]]
    # Stream.artifact_list_contents() -> Dict[str, List[str]]]
    # Stream.artifact_checkout() -> None
    #
    # Note that the public versions of these methods should guarantee that
    # returned objects are always "proxified".
    # 
    # Note that ArtifactElement currently does not have a proxy version.

    dependencies = app.stream.load_selection(
        ["element/example.bst"],
        selection=PipelineSelection.RUN,
    )
</sketch>

What do you all think about the problem presented here and the overall idea of 
opening up these APIs? Think there's a different way of achieving the same goal?

Your feedback is needed!

Regards,
Martín.

Refs:
[1] https://github.com/apache/buildstream/blob/master/contrib/bst-graph
[2] https://gitlab.com/CodethinkLabs/lorry/bst-to-lorry
[3] https://gitlab.com/BuildStream/buildstream-license-checker
[4] https://en.wikipedia.org/wiki/Software_supply_chain
[5] https://lists.apache.org/thread/q6gxjpld2vb1c9rqlsv24m12c087snc4
[6] https://github.com/apache/buildstream/blob/master/src/buildstream/_stream.py

Reply via email to