NightOwl888 commented on PR #1172:
URL: https://github.com/apache/lucenenet/pull/1172#issuecomment-3263289030
I have worked out how to successfully cache NuGet packages.
But after some research, I discovered that there are 2 caching goals that
are typically employed (sometimes both at the same time):
### Restore caching
Restore caching backs up and restores the local NuGet cache to the global
GitHub Actions cache. This is the most impactful cache option for small to
medium sized projects, since the time it takes to build is far dwarfed by the
time it takes to restore.
#### Cache key
The cache key only needs to be based on a hash of every project's
configuration of `<PackageRefernce>` elements and `<PackageVersion>` elements,
and the current OS.
```bash
# '**/*.*proj' includes .csproj, .vbproj, .fsproj, msbuildproj, etc.
# '**/*.props' includes Directory.Packages.props, Directory.Build.props, and
Dependencies.props
# '**/*.targets' includes Directory.Build.targets
# '**/*.sln' and '*.sln' ensure root solution files are included (minimatch
glitch for file extension .sln)
# 'global.json' included for SDK version changes
key: nuget-v1-${{ runner.os }}-${{ hashFiles('**/*.*proj', '**/*.props',
'**/*.targets', '**/*.sln', '*.sln', 'global.json') }}
```
#### Restore keys
There is also a setting for `restore-keys`. This is meant for npm, pip, or
other such package mangers where a partial hit is still valid. This approach
does not usually work for NuGet caches, so the setting should be omitted.
### Incremental build caching
Incremental build caching backs up and restores the local `obj` folders
(build output) to the global GitHub Actions cache. This may be impactful for
large projects, particularly those with many build jobs and common
dependencies, but typically for small projects with only a single build it
isn't worth the effort. This strategy takes advantage of the fact that the
GitHub Actions cache is **shared** between build agents.
#### Cache key
The cache key is typically based on a hash of every project's
`packages.lock.json` files, project files, `global.json` file, the OS, target
framework, and architecture (x64, arm64, etc).
```bash
key: build-${ runner.os }-${ matrix.tfm }-${ matrix.arch }-${
hashFiles('**/packages.lock.json', '**/*.*proj', 'global.json') }
```
For very large repositories, combining the project file hashes +
`packages.lock.json` is the safest option.
#### Restore keys
In this case restore keys can make a difference when most of the projects
don't change and only a few of them do. So, having a a fallback could improve
performance a bit if there is a cache miss.
#### Lock files
The `packages.lock.json` files only make sense to have on the CI server for
caching key purposes and there is a `dotnet restore --use-lock-file` command to
create them on demand. There is no requirement to set the
`RestorePackagesWithLockFile` setting or commit the lock files to the
repository. So, either of these 2 options is non-intrusive:
1. Leave `RestorePackagesWithLockFile` unset and use `dotnet restore
--use-lock-file` in CI
2. Set `RestorePackagesWithLockFile` and add `packages.lock.json` files to
`.gitignore`
> Technically, the `RestorePackagesWithLockFile` setting must be set for
`dotnet` commands run after `dotnet restore` to honor the lock files. However,
for caching purposes this doesn' matter because all we care about is key
consistency. In this scenario, the lock files are **cache metadata**, not for
dependency pinning.
---------------------------
After looking at the configuration in this PR, it seems that it is mixing
apples and oranges.
```yml
- name: Cache NuGet Packages
uses: actions/cache@v3
id: lucene-nuget-cache
with:
path: |
~/.nuget/packages
key: `${{ runner.os }}-nuget-`${{
hashFiles('**/packages.lock.json') }}
restore-keys: |
`${{ runner.os }}-nuget-lucene
- name: Restore dependencies
run: dotnet restore
```
There are a few different issues with this approach.
1. The path should not be hard coded to `~/.nuget/packages` since the
canonical way to retrieve the path is the `NUGET_PACKAGES` environment
variable, which is used by the `dotnet restore` and other commands.
2. Using the hash of '**/packages.lock.json' without including target
framework and architecture in the cache key is problematic. But even if that
were straightened out so the cache is correctly busted, there is no reason to
store NuGet caches based off of more than just OS. This arrangement will create
extra unnecessary duplicate caches. The bug here is that there is a mismatch
between what is being cached and how it is being keyed.
4. The restore key `${{ runner.os }}-nuget-lucene` is effectively the same
thing as having no key, because it will never match a prefix of the pattern
`${{ runner.os }}-nuget-${{ hashFiles('**/packages.lock.json') }}`. For the
fallback to work, it must be a prefix (or created like this elsewhere in GitHub
Actions).
-------------------------
### Proposal
I propose that we change this PR to use **restore caching** only, because
that will certainly have a big impact. From that point, we can decide whether
it will be worth the effort to enable incremental build caching (separate PR).
My best guess is that it will because we have so many parallel builds that
share dependencies.
I would be happy to fix the PR if you agree.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]