AdrianVovk commented on issue #1851:
URL: https://github.com/apache/buildstream/issues/1851#issuecomment-1668901834

   > Ok so at a glance, I can see some pretty bad problems with your plugins... 
looking right now at 
https://gitlab.com/carbonOS/build-meta/-/blob/35f91b6d/plugins/gen-cargo-lock.py
   
   I'm sorry, but you looked at the wrong plugin!! `gen-cargo-lock` is a very 
special case, as I'll explain below. My other plugins (`gnu` and `go-vendor`) 
should be a lot more normally implemented
   
   Also, I didn't write this one! I got it from here: 
https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/223
   
   This is part of my bst2 port. The bst1 version of the `gen-cargo-lock` 
plugin that I wrote _did_ actually store `Cargo.lock` in `project.refs` for 
exactly the reasons you detail. However, when trying to figure out how to port 
my plugins to `bst2`, I came across the existing plugin I linked above and 
realized that's a much cleaner solution.
   
   > Your source does not properly implement load_ref(), get_ref(), set_ref(), 
get_unique_key(), is_cached() or is_resolved()
   
   This is because it doesn't need to. `gen-cargo-lock` is a source that is 
always immediately followed by a `cargo` source. The `Cargo.lock` file this 
plugin generates is used for exactly one purpose: by the `cargo` source while 
tracking. All the data contained in the `Cargo.lock` file gets turned into 
buildstream refs by the `cargo` source and put into `project.refs`, so there's 
no need for the `gen-cargo-lock` plugin to have a unique ID. At build time, 
`cargo` has all of its dependencies vendored and so it doesn't need to look at 
`Cargo.lock`. See [here in my 
`project.refs`](https://gitlab.com/carbonOS/build-meta/-/blob/6f57dc2e/project.refs#L732-1020)
 for an example of this
   
   Here's the various steps:
   - During tracking, the `cargo` plugin needs a `Cargo.lock` file. Some 
projects don't ship one. So, we stage a `gen-cargo-lock` element to generate 
it. Then the `cargo` plugin executes, reads the `Cargo.lock` file we just 
generated, and turns it into buildstream refs for all the dependencies. The 
unique-id is calculated from the ref the `cargo` plugin generated, which will 
be the real source of truth when it comes time to stage the sources for the 
build
   - During fetch, we already have all the version numbers and hashes of all 
dependencies tracked under the `cargo` plugin's ref. So, we download those 
dependencies. This is done by the `cargo` source. `Cargo.lock` is redundant here
   - During stage, it's the same as fetch. We have all the data available, so 
we set up a special vendoring directory and populate it with all the libraries. 
Again, all done by the `cargo` source. `Cargo.lock` is redundant here
   
   If it doesn't work for some reason (i.e. if the toolchain in the sandbox 
really does need `Cargo.lock` around during the build) I'll serialize the file 
into a ref like I did before and like you describe.
   
   > but I think a fundamental understanding needs to be explained somewhere in 
the documentation to make plugin authoring more intuitive, I'm not sure where 
in the docs or how best to explain this.
   
   I absolutely agree. `bst`'s plugin architecture definitely felt 
confusing/overwhelming when I first started writing plugins. And I made lots of 
the mistakes you detail here. I fixed them over time as people tried to build 
my project in their own clean environment and failed, and that taught me how 
the system works. But it'd be nice if it could be documented enough to get it 
right the first time :)
   
   A lot of what you wrote here, especially the "Overall reproducibility 
expectations" section, would be useful in the docs. Otherwise the plugin 
architecture feels a bit impenetrable. Some extra docs regarding URL 
translation (when should it be done? how? what does it mean to mark a URL? why 
does this need to be done?) would be nice too; that stuff still confuses me...
   
   It can probably just be top-level documentation for the `Source` class 
itself (i.e. 
[here](https://docs.buildstream.build/2.0/buildstream.source.html)). 
Worst-case, you can add to the Additional Writings section
   
   > As a side note, every source which uses gen-cargo-lock in your project 
shares the same Cargo.lock file, because you have put it directly at the root 
of the mirror directory owned by the gen-cargo-lock source kind.
   
   Hmm so this might be a problem if multiple tracks are happening in parallel. 
Again, we only need to ferry the data from one source to the very next source, 
and then that one will turn it into an actual ref. My concern here is that if 
we have multiple cargo projects being tracked in parallel, they share the one 
file and trample each-other. Somes like I just need to incorporate the project 
and element name into the local cache name to avoid a race condition
   
   > Consequently, your plugin risks introducing different data for the same 
cache key, risking the following scenario: 
   
   This might be an issue if the build chain ends up looking at the 
`Cargo.lock` file anyway, despite not having a need to since all dependencies 
are already vendored.
   
   It would be nice if I could tell, from `stage`, if I'm staging for a track 
or for a build. If it's for a track I'd stage `Cargo.lock`, but if it's for a 
build I'd stage nothing to ensure consistency between builds
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to