Re: Shallow git update in bootstrap
Hi, On 06 Aug 2023 at 16:56:45, Bruno Haible wrote: > Carles Pina i Estany wrote: > > When I say "long time" (and data transmission) in my case it's 9 > > minutes: > > - > > carles@pinux:[master]~/git/wget2$ time git submodule update --init > > Cloning into '/home/carles/git/wget2/gnulib'... > > Submodule path 'gnulib': checked out > > '2ae6faf9c78384dc6a2674b62dd56ff153cd51f6' > > > > real9m1,135s > > user6m24,309s > > sys 0m5,020s > > - above was with a 4G+ connection and a laptop > I can reproduce that the --depth option has a big impact on the > 'git clone' execution time: > > - no --depth: 50 sec. > - --depth=2000: 16 sec. > - --depth=1: 5 sec. In my day to day VPS server (low spec): - no --depth: 6 minutes (majority of time in "resolving deltas") (152 MB in the cloned directory) - --depth=2000: 1 min 30 sec (100 MB) - --depth=1:7 seconds (88 MB) > However, --depth=1 has the problem that it may/will cause trouble to the > developer later, if they use more than "git pull". Namely, > - In 'git log' the history will be truncated, > - 'git bisect' may not work, > - 'git annotate' will show a wrong author for many lines of code. > > '--depth=2000' would be a middle ground, but it still has the 'git annotate' > problem. agree with above > These troubles are probably not worth the saved 'git clone' time upfront. > > However, when doing automated builds, such as continuous integration, > --depth=1 saves a lot of time, and is not problematic, since the build > directory is getting deleted anyway 10 minutes later. > > How about adding to 'bootstrap' an option '--for-build' that has the > effect that all submodule clones will be fetched with --depth=1 ? >From my initial point of view (slower connections, metered connections) and also for saving CI building time and bandwidth (and git.savannah.gnu.org bandwidth): an option '--for-build' seems very useful. Thanks for considering it, -- Carles Pina i Estany https://carles.pina.cat || Wiktionary translations: https://kamus.pina.cat
Re: Shallow git update in bootstrap
> However, --depth=1 has the problem ... Another problem of --depth=1 is that it may fail with git versions < 2.8: https://jira.mariadb.org/browse/MDEV-28032?workflowName=MariaDB+v4=1 https://github.com/git/git/commit/fb43e31f2b43076e7a30c9cd00d024 Therefore, it's not OK to use --depth=1 by default. But it would be OK to do it based on a command-line option or environment variable. Bruno
Re: Shallow git update in bootstrap
Carles Pina i Estany wrote: > Actually, the first time I wondered if the connection or something else > failed. The latter is a user mistake, since there was a message "Cloning into '/home/carles/git/wget2/gnulib'..." and the "..." tells that it may take some time. > When I say "long time" (and data transmission) in my case it's 9 > minutes: > - > carles@pinux:[master]~/git/wget2$ time git submodule update --init > Cloning into '/home/carles/git/wget2/gnulib'... > Submodule path 'gnulib': checked out > '2ae6faf9c78384dc6a2674b62dd56ff153cd51f6' > > real 9m1,135s > user 6m24,309s > sys 0m5,020s > - I can reproduce that the --depth option has a big impact on the 'git clone' execution time: - no --depth: 50 sec. - --depth=2000: 16 sec. - --depth=1: 5 sec. However, --depth=1 has the problem that it may/will cause trouble to the developer later, if they use more than "git pull". Namely, - In 'git log' the history will be truncated, - 'git bisect' may not work, - 'git annotate' will show a wrong author for many lines of code. '--depth=2000' would be a middle ground, but it still has the 'git annotate' problem. These troubles are probably not worth the saved 'git clone' time upfront. However, when doing automated builds, such as continuous integration, --depth=1 saves a lot of time, and is not problematic, since the build directory is getting deleted anyway 10 minutes later. How about adding to 'bootstrap' an option '--for-build' that has the effect that all submodule clones will be fetched with --depth=1 ? Tim Rühsen wrote: > To speed things up in container CI environments: > If containers are only used once, git clone gnulib at image creation > time and do "rmdir gnulib && mv /gnulib . && git submodule update > gnulib" in the container. Nice trick. Let's see how it competes with with a --depth=1 option. I would expect that if you use the same image for a year, the 'git submodule update gnulib' step gets slower and slower over that year, until you create a new image. Bruno
Re: Shallow git update in bootstrap
On 8/6/23 00:25, Carles Pina i Estany wrote: When I say "long time" (and data transmission) in my case it's 9 minutes: - carles@pinux:[master]~/git/wget2$ time git submodule update --init Cloning into '/home/carles/git/wget2/gnulib'... Submodule path 'gnulib': checked out '2ae6faf9c78384dc6a2674b62dd56ff153cd51f6' real9m1,135s user6m24,309s sys 0m5,020s - Not answering your question, but may be helpful: If you regularly build wget2 from git, you only have to do download gnulib once. When the gnulib submodule becomes updated by the project (happens from time to time), only the missing parts are downloaded by git, which should be fast even on slow network connections. Another option is to git clone gnulib into a separate directory outside the project directory and set the env variable GNULIB_REFDIR to this directory (e.g. "export GNULIB_REFDIR=/home/carles/git/gnulib"). When needed (or eventually), use "git pull" from inside the gnulib directory to update it. The `./bootstrap` script in the wget2 project then fetches the needed gnulib commits from from $GNULIB_REFDIR. To speed things up in container CI environments: If containers are only used once, git clone gnulib at image creation time and do "rmdir gnulib && mv /gnulib . && git submodule update gnulib" in the container. This is still experimental, just started using it yesterday without experiencing any downsides so far. Regards, Tim OpenPGP_signature Description: OpenPGP digital signature
Shallow git update in bootstrap
Hi, This is a wishlist / question regarding using "--depth 2" in "git submodule init --" in the bootstrap file. I was building a project that uses gnulib (with bootstrap). ./bootstrap does: """ if git_modules_config submodule.gnulib.url >/dev/null; then echo "$0: getting gnulib files..." git submodule init -- "$gnulib_path" || exit $? git submodule update -- "$gnulib_path" || exit $? """ The "git submodule update" takes a long time. Would it be possible to use "--depth 1" there? (and in other "git submodule update"s? A few lines below it checks if "git clone -h 2>&1" has the option --depth and use it if possible. Perhaps the same approch could be done in the "git submodule update"s ? (in my case the default code path uses "git submodule update" and not "git clone" with the --depth 2) I wonder if there is any reason not to use the --depth 2 for the update. When I say "long time" (and data transmission) in my case it's 9 minutes: - carles@pinux:[master]~/git/wget2$ time git submodule update --init Cloning into '/home/carles/git/wget2/gnulib'... Submodule path 'gnulib': checked out '2ae6faf9c78384dc6a2674b62dd56ff153cd51f6' real9m1,135s user6m24,309s sys 0m5,020s - Actually, the first time I wondered if the connection or something else failed. Thank you very much, -- Carles Pina i Estany https://carles.pina.cat || Wiktionary translations: https://kamus.pina.cat