Re: [PATCH/RFC 5/5] add tests for checking the behaviour of unset.variable
Junio C Hamano gits...@pobox.com writes: Tanay Abhra tanay...@gmail.com writes: I can think of two solutions, one leave it as it is and advertise it to be explicitly typed in the config files at the appropriate position or to change the behavior of unset.variable to unset all matching variables in that file, before and after. We could also change git config --add to append at the end of the file regardless the variable exists or not. Which course of action do you think would be best? Off the top of my head, from an end-user's point of view, something like this would give a behaviour that is at least understandable: (1) forbid git config command line from touching unset.var, as there is no way for a user to control where a new unset.var goes. And Well, the normal use-case for unset.variable is to put it in a local config file, to unset a variable set in another, lower-priority file. This common use-case works with the command-line git config, and it would be a pity to forbid the common use-case because of a particular, unusual case. (2) When adding or appending section.var (it may also apply to removing one--you need to think about it deeper), ignore everything that comes before the last appearance of unset.var that unsets the section.var variable. That would probably be the best option from a user's point of view, but I'd say the implementation complexity is not worth the trouble. Alternatively, if the syntax to unset a section.var were not [unset] variable = section.var but rather [section] ! variable or soemthing, then the current find the section and append at the end code may work as-is. But that would break backward compatibility rather badly: old git's would stop working completely in repositories using this syntax. Well, perhaps we can also consider that this is acceptable: just don't use the feature for a few years if you care about backward compatibility. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TOY PATCH]: rebase: Add --show-files option
Hi, On Fri, Oct 3, 2014 at 5:42 PM, Nazri Ramliy ayieh...@gmail.com wrote: Hi, When working on a new feature branch that touches a lot of files I tend to make commits that affect only single files, and for very small changes. Since at this stage I'm experimentating a lot - trying out ideas, etc. - the commits tend to grow a lot (could be 50-70 individual commits, each modifying one or two files), and I don't think much about the commit message beside making a one-liner that explains only the gist. Most of the times I include the filename in the commit message to help me identify which commits should be squashed together later. Only when the feature seems to be functional that I git rebase the commits in order to shape the history into its final, proper form. When rebasing these upwards of 40+ commits, it is helpful if the rebase instruction sheet shows me the actual files that the commits affect so I made this patch (sorry I couldn't attach it inline since gmail eats all the tabs) that adds the --show-files option to git-rebase to achieve something to this effect: pick 996fa59 Remove autoconf submodule # :100644 100644 cfc8a25... 28ddb02... M .gitmodules # :16 00 0263a9f... 000... D autoconf ... more pick lines pick 4c5070f Remove automake submodule # :100644 100644 28ddb02... f907328... M .gitmodules # :16 00 9042530... 000... D automake Having the list of files shown below each commit, indented to reduce cluttering the pick instruction, really does help in deciding the reorder and squash candidates. Sounds neat. I do similar things and do sometimes lose track of which files are being touched by multiple fix compile error commits. I haven't actually looked at your patch because gmail can't display it in-line but it's a feature I'd use. The files list came from this: git show --raw $sha1|awk '/^:/ {print '${comment_char}'\t$0}' Thoughts? nazri -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Compile fails on OSX 10.9 with FSF GCC in common crypto
* Works with Apple clang * Works with FSF GCC and NO_APPLE_COMMON_CRYPTO=yes. * Fails with FSF GCC, see below * FWIW, git-1.8.4.4 can be compiled with FSF GCC $ gcc --version gcc (GCC) 4.7.3 Copyright (C) 2012 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ ./configure configure: Setting lib to 'lib' (the default) configure: Will try -pthread then -lpthread to enable POSIX Threads. configure: CHECKS for site configuration checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking how to run the C preprocessor... cpp checking for grep that handles long lines and -e... /usr/bin/grep checking for egrep... /usr/bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for size_t... yes checking for working alloca.h... yes checking for alloca... yes configure: CHECKS for programs checking whether we are using the GNU C compiler... (cached) yes checking whether gcc accepts -g... (cached) yes checking for gcc option to accept ISO C89... (cached) none needed checking for inline... inline checking if linker supports -R... no checking if linker supports -Wl,-rpath,... yes checking for gtar... no checking for tar... tar checking for gnudiff... no checking for gdiff... no checking for diff... diff checking for asciidoc... no configure: CHECKS for libraries checking for SHA1_Init in -lcrypto... yes checking for curl_global_init in -lcurl... yes checking for XML_ParserCreate in -lexpat... yes checking for iconv in -lc... no checking for iconv in -liconv... yes checking for deflateBound in -lz... yes checking for socket in -lc... yes checking for inet_ntop... yes checking for inet_pton... yes checking for hstrerror... yes checking for basename in -lc... yes checking for gettext in -lc... no checking libintl.h usability... no checking libintl.h presence... no checking for libintl.h... no configure: CHECKS for header files checking sys/select.h usability... yes checking sys/select.h presence... yes checking for sys/select.h... yes checking sys/poll.h usability... yes checking sys/poll.h presence... yes checking for sys/poll.h... yes checking for inttypes.h... (cached) yes checking for old iconv()... no configure: CHECKS for typedefs, structures, and compiler characteristics checking for socklen_t... yes checking for struct dirent.d_ino... yes checking for struct dirent.d_type... yes checking for struct passwd.pw_gecos... yes checking for struct sockaddr_storage... yes checking for struct addrinfo... yes checking for getaddrinfo... yes checking for library containing getaddrinfo... none required checking whether the platform regex can handle null bytes... yes checking whether system succeeds to read fopen'ed directory... no checking whether snprintf() and/or vsnprintf() return bogus value... no configure: CHECKS for library functions checking libgen.h usability... yes checking libgen.h presence... yes checking for libgen.h... yes checking paths.h usability... yes checking paths.h presence... yes checking for paths.h... yes checking libcharset.h usability... yes checking libcharset.h presence... yes checking for libcharset.h... yes checking for strings.h... (cached) yes checking for locale_charset in -liconv... yes checking for strcasestr... yes checking for library containing strcasestr... none required checking for memmem... yes checking for library containing memmem... none required checking for strlcpy... yes checking for library containing strlcpy... none required checking for uintmax_t... yes checking for strtoumax... yes checking for library containing strtoumax... none required checking for setenv... yes checking for library containing setenv... none required checking for unsetenv... yes checking for library containing unsetenv... none required checking for mkdtemp... yes checking for library containing mkdtemp... none required checking for mkstemps... yes checking for library containing mkstemps... none required checking for initgroups... yes checking for library containing initgroups... none required checking for POSIX Threads with ''... yes configure: creating ./config.status config.status: creating config.mak.autogen config.status: executing config.mak.autogen commands $ make GIT_VERSION = 2.1.2 * new build flags CC credential-store.o In file included
Re: Submodules and GIT_ALTERNATE_OBJECT_DIRECTORIES
Am 30.09.2014 um 15:25 schrieb Michal Sojka: I'd like to shorten the time needed by our continuous integration (CI) tool to clone the source repositories. Currently the full clone takes about 10 minutes (even from local server). Our main repository has several submodules so the CI tool runs git submodule update --init. My idea was to use GIT_ALTERNATE_OBJECT_DIRECTORIES to cache objects from several submodule repositories locally. However, this does not work because GIT_ALTERNATE_OBJECT_DIRECTORIES is considered local to the super-project and is not propagated to the git clone for submodules (git-submodule.sh calls clear_local_git_env). My question is why is GIT_ALTERNATE_OBJECT_DIRECTORIES considered local to the repository? If I could modify the command-line I would use the git submodule update with the --reference option, which is propagated to clones of all submodules. Letting GIT_ALTERNATE_OBJECT_DIRECTORIES propagate to the submodules should have the same effect as --reference option. So why it is not propagated? Because then it would /always/ propagate? So while that would have the same effect as using the --reference option, not using the --reference option would behave differently, no? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Submodules and GIT_ALTERNATE_OBJECT_DIRECTORIES
On Fri, Oct 03 2014, Jens Lehmann wrote: Am 30.09.2014 um 15:25 schrieb Michal Sojka: I'd like to shorten the time needed by our continuous integration (CI) tool to clone the source repositories. Currently the full clone takes about 10 minutes (even from local server). Our main repository has several submodules so the CI tool runs git submodule update --init. My idea was to use GIT_ALTERNATE_OBJECT_DIRECTORIES to cache objects from several submodule repositories locally. However, this does not work because GIT_ALTERNATE_OBJECT_DIRECTORIES is considered local to the super-project and is not propagated to the git clone for submodules (git-submodule.sh calls clear_local_git_env). My question is why is GIT_ALTERNATE_OBJECT_DIRECTORIES considered local to the repository? If I could modify the command-line I would use the git submodule update with the --reference option, which is propagated to clones of all submodules. Letting GIT_ALTERNATE_OBJECT_DIRECTORIES propagate to the submodules should have the same effect as --reference option. So why it is not propagated? Because then it would /always/ propagate? So while that would have the same effect as using the --reference option, not using the --reference option would behave differently, no? That's a good reason, thanks. Fortunately, I found a way how to add --reference to the submodule update command issued by the CI tool (Bamboo). Instead of calling git directly, the CI tool calls my git script, which modifies the command line in case of submodule update and then calls the real git. Best regards, -Michal -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Feature request: Tracking info for remote branches
I really like the way Git officializes the relation between branches via the notion of tracking. I can see which local branch tracks which remote branch easily, and that's very helpful. But when I find a Git repository on the Web, I often have no idea about the relationship between its branches, all I have instead is the branches's names, which is often not sufficient. Let's take for example https://bitbucket.org/emiliolopez/linux.git. Among its branches I see sunxi-codec and sunxi-codec-v0. The repository name gives me a hint that these are really branches of the Linux kernel (rather than, say, branches of Emacs). And the branches's names gice me a hint that these are related to support for sunxi (aka Allwinner SoCs) and more specifically support for codecs. I know from out-of-band info that these are for audio codecs. But I don't know if these branches track linux-next, sunxi-devel, or some other branch. I can manually try to find out, by comparing the distance to each one of those branches. But: - it's costly. - it can only be done manually, because a script wouldn't know that linux-next and sunxi-devel are better candidate branches than, say, emacs-24. - as a consequence, front ends (bitbucket, cgit, gitweb, younameit) can't show this information. So, I'd like to suggest that Git be changed so that the branch tracking information is also maintained and made available for remote branches. E.g. when you push a branch, the info about which is the corresponding target branch (if any) be also pushed to the remote repository along with the actual new commits. Similarly, when cloning a remote repository, that information should be copied locally so you can ask Git about the relationship between those remote branches you just downloaded. Stefan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] init - Honour the global core.filemode setting
On 2014-10-02 19.02, Junio C Hamano wrote: Torsten Bögershausen tbo...@web.de writes: On 2014-10-01 19.10, Junio C Hamano wrote: Hilco Wijbenga hilco.wijbe...@gmail.com writes: Perhaps I completely misunderstand the meaning of core.filemode but I thought it determined whether Git cared about changes in file properties? By setting it to false, you tell Git that the filesystem you placed the repository does not correctly represent the filemode (especially the executable bit). core.fileMode in git config --help reads: core.fileMode If false, the executable bit differences between the index and the working tree are ignored; useful on broken filesystems like FAT. See git-update- index(1). Out of my head: Could the following be a starting point: core.fileMode If false, the executable bit differences between the index and the working tree are ignored. This may be usefull when visiting a cygwin repo with a non-cygwin Git client. (should we mention msysgit ? should we mention JGit/EGit ?) Between these two sentences, there may still be the same cognitive gap that may have lead to the original confusion. The first sentence says what happens, as it should. But it is not directly clear what makes the executable bit differ and when it is a useful thing to ignore the differences, so the second sentence that says This may be useful does not give the reader very much. Clearly a major improvement. Does this (still) include the original line See linkgit:git-update-index[1] which helps the user to add *.sh files executable to the index, even if core.filemode is false ? One minor improvement below. Here is my attempt. Tells Git if the executable bit of files in the working tree is to be honored. Some filesystems lose the executable bit when a file that is marked as executable is checked out, or checks out an non-executable file with executable bit on. git init and git clone probe the filesystem to see if it records executable bit correctly when they create a new repository and this variable is automatically set as necessary. A repository, however, may be on a filesystem that records the filemode correctly, and this variable is set to 'true' when created, but later may be made accessible from another environment that loses the filemode (e.g. exporting ext4 via CIFS mount, visiting a Cygwin managed repository with MsysGit). In such a case, it may be necessary to set this variable to 'false'. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] init - Honour the global core.filemode setting
Torsten Bögershausen tbo...@web.de writes: The first sentence says what happens, as it should. But it is not directly clear what makes the executable bit differ and when it is a useful thing to ignore the differences, so the second sentence that says This may be useful does not give the reader very much. Clearly a major improvement. Does this (still) include the original line See linkgit:git-update-index[1] which helps the user to add *.sh files executable to the index, even if core.filemode is false ? Thanks for catching; I omitted the reference by accident. Perhaps end that sentence like this instead? ... may be necessary to set this variable to `false`, and maintain the executable bit with `git update-index --chmod` (see linkgit:git-update-index[1]). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 5/5] add tests for checking the behaviour of unset.variable
Matthieu Moy matthieu@grenoble-inp.fr writes: Junio C Hamano gits...@pobox.com writes: ... Off the top of my head, from an end-user's point of view, something like this would give a behaviour that is at least understandable: Let's make sure we have the same starting point (i.e. understanding of the limitation of the current code) in mind. The git config [--add] section.var value UI, which does not give the user any control over where the new definition goes in the resulting file, and the current implementation, which finds the last existing section and adds the var = value definition at the end (or adds a section at the end and then adds the definition there), are adequate for ordinary variables. It is fine for single-valued ones that follow the last one wins semantics; git config would add the new definition at the end and that definition will win. It is manageable for multi-valued variables, too. For uses of a multi-valued variable to track unordered set of values, by definition the order does not matter. Side note. Otherwise, the scripter needs to read the existing ones, --unset-all them, and then add the elements of the final list in desired order. It is cumbersome, but for a single multi-valued variable it is manageable. But the unset.variable by nature wants a lot finer-grained control over the order in which other ordinary variables are defined and it itself is placed. No matter what improvements you attempt to make to the implementation, because the UI is not getting enough information from the user to learn where exactly the user wants to add a new variable or unset.variable, it would not give you enough flexibility to do the job. (1) forbid git config command line from touching unset.var, as there is no way for a user to control where a new unset.var goes. And Well, the normal use-case for unset.variable is to put it in a local config file, to unset a variable set in another, lower-priority file. I agree that is one major use case. This common use-case works with the command-line git config, and it would be a pity to forbid the common use-case because of a particular, unusual case. Either you are being incoherent or I am not reading you right. If you said If this common use-case worked with the command-line 'git config', it would be nice, but it would be a pity because it does not, I would understand. If you use the command line 'git config', i.e. git config unset.variable xyzzy.frotz in a repository whose .git/config does not have any unset.variable, you will add that _at the end_, which would undo what you did in your configuration file, not just what came before yours. Even if you ignore more exotic cases, the command line is *not* working. That is why I said unset.variable is unworkable with existing git config command line. Always appending at the end is usable for ordinary variables, but for unset.variable, it is most likely the least useful thing to do. You can explain among 47 different things it could do, we chose to do the most useless thing, because that is _consistent_ with how the ordinary variables are placed in the cofiguration file in the documentation but it forgets to question if unset.variable should be treated the same way as ordinary variables in the first place. Another use case would be to override what includes gave us. I.e. [unset] variable = ... nullify some /etc/gitconfig values ... [include] path = ~/.gitcommon [unset] variable = ... nullify some ~/.gitcommon values ... [xyzzy] frotz = nitfol Special-casing unset.variable and always adding them at the beginning of the output *might* turn it from totally useless to slightly usable. At least it supports the nullify previous ones, even though it does not help nullify after include. I doubt if such a change to add unset.variable always at the top is worth it, though. (2) When adding or appending section.var (it may also apply to removing one--you need to think about it deeper), ignore everything that comes before the last appearance of unset.var that unsets the section.var variable. That would probably be the best option from a user's point of view, but I'd say the implementation complexity is not worth the trouble. test_expect_success declares a particular behaviour as the right thing to happen in order to protect that right behaviour from future changes. It is OK for you to illustrate what illogical thing the implementation does as a caveat, and admit that the behaviour comes because we consider the change is not worth the trouble and we punted. But pretending as if that is the right behaviour and the system has to promise that the broken behaviour will be kept forever by casting it in stone is the worst thing you can do, I am afraid. Unfortunately, as I already said, there is no sensible behaviour to
Re: [PATCH/RFC 5/5] add tests for checking the behaviour of unset.variable
Junio C Hamano gits...@pobox.com writes: That is why I said unset.variable is unworkable with existing git config command line. Always appending at the end is usable for ordinary variables, but for unset.variable, it is most likely the least useful thing to do. You can explain among 47 different things it could do, we chose to do the most useless thing, because that is _consistent_ with how the ordinary variables are placed in the cofiguration file in the documentation but it forgets to question if unset.variable should be treated the same way as ordinary variables in the first place. This is a tangent, but the above also applies to the include.path. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/24] refs.c: pass a list of names to skip to is_refname_available
Junio C Hamano wrote: Jonathan Nieder jrnie...@gmail.com writes: diff --git a/refs.c b/refs.c index f124c2b..6820c93 100644 --- a/refs.c +++ b/refs.c @@ -801,14 +801,16 @@ static int names_conflict(const char *refname1, const char *refname2) struct name_conflict_cb { const char *refname; -const char *oldrefname; const char *conflicting_refname; +struct string_list *skiplist; }; As cbe73331 (refs: speed up is_refname_available, 2014-09-10) touches the same area and is now in 'master', the logic around here in this series needs to be reworked. Thanks for the heads up. (I hadn't realized the ref-transaction-1 was part of 'master' --- yay!) Rebased and reworked: https://code-review.googlesource.com/1027 I'll give a week or so for review comments on this version of the series and then post a hopefully final version. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 1/2] config: Add safe-include directive
Junio C Hamano gits...@pobox.com writes: Even though I did allude to ../project.gitconfig in the original message, I think there should probably be an explicit syntax to name a path that is relative to the root level of the working tree. People do funky things using $GIT_DIR and $GIT_WORK_TREE to break the .. relative to the config file is the root level of the working tree assumption, and also a repository can have a regular file .git that points at the real location of the directory that has config in it, in which case its parent directory is very unlikely to be the root level of the working tree. There is another reason why I suspect that it may make the resulting system more useful if we had a way to explicitly mark the path you used to safeInclude (by the way, we do not do dashes in names for configuration by convention) as referring to something inside the project's working tree. In a bare repository, we might want to grab the blob at that path in HEAD (i.e. the project's primary branch) and include its contents. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/24] refs.c: pass a list of names to skip to is_refname_available
Jonathan Nieder jrnie...@gmail.com writes: Junio C Hamano wrote: ... As cbe73331 (refs: speed up is_refname_available, 2014-09-10) touches the same area and is now in 'master', the logic around here in this series needs to be reworked. Thanks for the heads up. (I hadn't realized the ref-transaction-1 was part of 'master' --- yay!) Rebased and reworked: https://code-review.googlesource.com/1027 Heh, I was sort of expecting that heavy-weight contributors that throw many and/or large patch series would at least be paying attention to What's cooking reports. Will take a look over there (I really hate having to context switch only to review this series, though). I'll give a week or so for review comments on this version of the series and then post a hopefully final version. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TOY PATCH]: rebase: Add --show-files option
Nazri Ramliy ayieh...@gmail.com writes: When rebasing these upwards of 40+ commits, it is helpful if the rebase instruction sheet shows me the actual files that the commits affect so I made this patch (sorry I couldn't attach it inline since gmail eats all the tabs) that adds the --show-files option to git-rebase to achieve something to this effect: pick 996fa59 Remove autoconf submodule # :100644 100644 cfc8a25... 28ddb02... M .gitmodules # :16 00 0263a9f... 000... D autoconf ... more pick lines pick 4c5070f Remove automake submodule # :100644 100644 28ddb02... f907328... M .gitmodules # :16 00 9042530... 000... D automake Having the list of files shown below each commit, indented to reduce cluttering the pick instruction, really does help in deciding the reorder and squash candidates. Sounds like a good idea to give helpful information in a comment form to the insn sheet. Other than two minor points: - If I were doing this, I would have used diff-tree --stat --summary instead of show --raw. You can tell deletion/addition by paying attention to 0's and also mode changes, but the information density of --raw for human consumption is rather low. - Regardless of the above, I am not sure if dumping listing of 100+ paths modified would really help, and it might make sense to cap the number of paths displayed for each change. I didn't look at your implementation at all, though. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 5/5] add tests for checking the behaviour of unset.variable
Junio C Hamano gits...@pobox.com writes: The git config [--add] section.var value UI, [...] finds the var = value definition at the end (or adds a section at the end and then adds [...] It is fine for single-valued ones that follow the last one wins semantics; git config would add the new definition at the end and that definition will win. Not always. git config foo.bar old-value git config unset.variable foo.bar git config foo.bar new-value One could expect the new value to be taken into account, but it is not. Well, the normal use-case for unset.variable is to put it in a local config file, to unset a variable set in another, lower-priority file. I agree that is one major use case. This common use-case works with the command-line git config, and it would be a pity to forbid the common use-case because of a particular, unusual case. Either you are being incoherent or I am not reading you right. If you said If this common use-case worked with the command-line 'git config', it would be nice, but it would be a pity because it does not, I would understand. I think you missed the another in my sentence above. The normal use-case is to have foo.bar and unset.variable=foo.bar in different files. In this case, you do not care about the position in file. in a repository whose .git/config does not have any unset.variable, you will add that _at the end_, which would undo what you did in your configuration file, not just what came before yours. Even if you ignore more exotic cases, the command line is *not* working. If my sysadmin has set foo.bar=boz in /etc/gitconfig, I can use git config [--global] unset.variable foo.bar and it does work. Always. Playing with the order of variables in-file is essentially useless OTOH except for the include case you mentionned (if I want to unset a variable in a file, I'll just delete or comment out the variable and I don't need unset.variable). Really, I don't see the point in making any complex plans to support the useless part of the unset.variable feature. The reason it was designed for already works, and $EDITOR does the job for other cases. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 5/5] add tests for checking the behaviour of unset.variable
Matthieu Moy matthieu@grenoble-inp.fr writes: Junio C Hamano gits...@pobox.com writes: The git config [--add] section.var value UI, [...] finds the var = value definition at the end (or adds a section at the end and then adds [...] It is fine for single-valued ones that follow the last one wins semantics; git config would add the new definition at the end and that definition will win. Not always. git config foo.bar old-value git config unset.variable foo.bar git config foo.bar new-value One could expect the new value to be taken into account, but it is not. I think you misunderstood what I said. With ordinary variables, everything works fine, that is, without unset.variable, which this series is trying to pretend as if it were just another ordinary variable, but it is not. You are just showing how broken it is to treat unset.variable as just another ordinary variable, which is what I've been telling you. So we are in agreement. Well, the normal use-case for unset.variable is to put it in a local config file, to unset a variable set in another, lower-priority file. I agree that is one major use case. This common use-case works with the command-line git config, and it would be a pity to forbid the common use-case because of a particular, unusual case. Either you are being incoherent or I am not reading you right. If you said If this common use-case worked with the command-line 'git config', it would be nice, but it would be a pity because it does not, I would understand. I think you missed the another in my sentence above. The normal use-case is to have foo.bar and unset.variable=foo.bar in different files. In this case, you do not care about the position in file. I didn't miss anything. The reason you want to have unset foo.bar in your .git/config is to override a foo.bar = z you have in another place, e.g. ~/.gitconfig; which would let you sanely do another foo.bar = x in .git/config for multi-value foo.bar, *if* the unset comes before foo.bar. But what if you have this in your .git/config file [core] bare = false ... standard stuff left by git init ... [foo] bar = x before you add unset foo.bar there? And that is not a contribed example. The likely reason why you want to resort to unset foo.bar is that you found that you get an unwanted foo.bar=z in addition to desired foo.bar=z in a repository that has the above in .git/config, and at that point you would want to say I want to unset the outside influence. And there is no [unset] variable = foo.bar in there yet; without some special casing, if you treated unset.variable as if it were just another ordinary variable, it would go after you define your own foo.bar=x, which is not what you want. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/16] foreach_alt_odb: propagate return value from callback
We check the return value of the callback and stop iterating if it is non-zero. However, we do not make the non-zero return value available to the caller, so they have no way of knowing whether the operation succeeded or not (technically they can keep their own error flag in the callback data, but that is unlike our other for_each functions). Signed-off-by: Jeff King p...@peff.net --- cache.h | 2 +- sha1_file.c | 12 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/cache.h b/cache.h index 8206039..cd16e25 100644 --- a/cache.h +++ b/cache.h @@ -1143,7 +1143,7 @@ extern void prepare_alt_odb(void); extern void read_info_alternates(const char * relative_base, int depth); extern void add_to_alternates_file(const char *reference); typedef int alt_odb_fn(struct alternate_object_database *, void *); -extern void foreach_alt_odb(alt_odb_fn, void*); +extern int foreach_alt_odb(alt_odb_fn, void*); struct pack_window { struct pack_window *next; diff --git a/sha1_file.c b/sha1_file.c index c08c0cb..bae1c15 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -412,14 +412,18 @@ void add_to_alternates_file(const char *reference) link_alt_odb_entries(alt, strlen(alt), '\n', NULL, 0); } -void foreach_alt_odb(alt_odb_fn fn, void *cb) +int foreach_alt_odb(alt_odb_fn fn, void *cb) { struct alternate_object_database *ent; + int r = 0; prepare_alt_odb(); - for (ent = alt_odb_list; ent; ent = ent-next) - if (fn(ent, cb)) - return; + for (ent = alt_odb_list; ent; ent = ent-next) { + int r = fn(ent, cb); + if (r) + break; + } + return r; } void prepare_alt_odb(void) -- 2.1.1.566.gdb1f904 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/16] make prune mtime-checking more careful
At GitHub we've occasionally run across repos getting corrupted by trees and blobs near the tip going missing. We do a lot of test merges between branches and HEAD (this is what feeds the OK to merge button on the web interface), and the objects are almost always related to these merges. The objects are removed by prune, which doesn't realize that they are part of an ongoing operation. Prune uses the filesystem mtime to determine this, but we are not very thorough in making sure that is kept up to date. This series tries to fix that with two techniques: 1. When we try to write an object to disk, we optimize out the write if we already have the object. Instead, we should still update the mtime of the object in this case. 2. Treat objects reachable from recent objects as recent themselves. When we check that we have an object, we do not check whether we have all of the objects it can reach. If you have some new objects that refer to some old objects (e.g., you create and delete a tree on day 1, and then create a new tree referring to the blob on day 2), then prune may delete the old object but not the new (in this case, we delete the blob but not the tree). Any subsequent use of the new object will check that we have it (e.g., commit-tree makes sure we have the tree we feed it), but not other objects it can reach. This can lead to referencing a half-formed part of the graph. Note that this does not make prune race-free. For example, you could check for and update the mtime of an object just as prune is deleting it, and think that it is written when it is not. Fixing that would require some atomic mechanism for prune to check the mtime and delete. But I do think this series cuts us down to real race conditions, with millisecond-ish timing. The problems we're fixing here are much worse than that. The distance between an object being written and being referred may operate on human timescales (e.g., writing a commit message). Or the time distance between two objects that refer to each other may be days or weeks; a prune where one falls in the recent boundary and another does not can be disastrous. There's quite a lot of patches here, but most of them are preparatory cleanups. The meat is in patches 13, 15, and 16. [01/16]: foreach_alt_odb: propagate return value from callback [02/16]: isxdigit: cast input to unsigned char [03/16]: object_array: factor out slopbuf-freeing logic [04/16]: object_array: add a clear function [05/16]: clean up name allocation in prepare_revision_walk [06/16]: reachable: clear pending array after walking it [07/16]: t5304: use test_path_is_* instead of test -f [08/16]: t5304: use helper to report failure of test foo = bar [09/16]: prune: factor out loose-object directory traversal [10/16]: count-objects: do not use xsize_t when counting object size [11/16]: count-objects: use for_each_loose_file_in_objdir [12/16]: sha1_file: add for_each iterators for loose and packed objects [13/16]: prune: keep objects reachable from recent objects [14/16]: pack-objects: refactor unpack-unreachable expiration check [15/16]: pack-objects: match prune logic for discarding objects [16/16]: write_sha1_file: freshen existing objects Note that these aren't yet running on GitHub servers. I know that they fix real potential problems (see the new t6501 for examples), but I don't know for sure if they will catch the problems we have seen. The frequency of these issues is relatively rare, so even once deployed, we won't know for sure until a few weeks or months have passed. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/16] object_array: factor out slopbuf-freeing logic
This is not a lot of code, but it's a logical construct that should not need to be repeated (and we are about to add a third repetition). Signed-off-by: Jeff King p...@peff.net --- object.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/object.c b/object.c index ca9d790..14238dc 100644 --- a/object.c +++ b/object.c @@ -355,6 +355,12 @@ void add_object_array_with_context(struct object *obj, const char *name, struct add_object_array_with_mode_context(obj, name, array, S_IFINVALID, context); } +static void object_array_release_entry(struct object_array_entry *ent) +{ + if (ent-name != object_array_slopbuf) + free(ent-name); +} + void object_array_filter(struct object_array *array, object_array_each_func_t want, void *cb_data) { @@ -367,8 +373,7 @@ void object_array_filter(struct object_array *array, objects[dst] = objects[src]; dst++; } else { - if (objects[src].name != object_array_slopbuf) - free(objects[src].name); + object_array_release_entry(objects[src]); } } array-nr = dst; @@ -400,8 +405,7 @@ void object_array_remove_duplicates(struct object_array *array) objects[array-nr] = objects[src]; array-nr++; } else { - if (objects[src].name != object_array_slopbuf) - free(objects[src].name); + object_array_release_entry(objects[src]); } } } -- 2.1.1.566.gdb1f904 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/16] object_array: add a clear function
There's currently no easy way to free the memory associated with an object_array (and in most cases, we simply leak the memory in a rev_info's pending array). Let's provide a helper to make this easier to handle. We can make use of it in list-objects.c, which does the same thing by hand (but fails to free the name field of each entry, potentially leaking memory). Signed-off-by: Jeff King p...@peff.net --- list-objects.c | 7 +-- object.c | 10 ++ object.h | 6 ++ 3 files changed, 17 insertions(+), 6 deletions(-) diff --git a/list-objects.c b/list-objects.c index 3595ee7..fad6808 100644 --- a/list-objects.c +++ b/list-objects.c @@ -228,11 +228,6 @@ void traverse_commit_list(struct rev_info *revs, die(unknown pending object %s (%s), sha1_to_hex(obj-sha1), name); } - if (revs-pending.nr) { - free(revs-pending.objects); - revs-pending.nr = 0; - revs-pending.alloc = 0; - revs-pending.objects = NULL; - } + object_array_clear(revs-pending); strbuf_release(base); } diff --git a/object.c b/object.c index 14238dc..e36e33d 100644 --- a/object.c +++ b/object.c @@ -379,6 +379,16 @@ void object_array_filter(struct object_array *array, array-nr = dst; } +void object_array_clear(struct object_array *array) +{ + int i; + for (i = 0; i array-nr; i++) + object_array_release_entry(array-objects[i]); + free(array-objects); + array-objects = NULL; + array-nr = array-alloc = 0; +} + /* * Return true iff array already contains an entry with name. */ diff --git a/object.h b/object.h index e028ced..2a755a2 100644 --- a/object.h +++ b/object.h @@ -133,6 +133,12 @@ void object_array_filter(struct object_array *array, */ void object_array_remove_duplicates(struct object_array *array); +/* + * Remove any objects from the array, freeing all used memory; afterwards + * the array is ready to store more objects with add_object_array(). + */ +void object_array_clear(struct object_array *array); + void clear_object_flags(unsigned flags); #endif /* OBJECT_H */ -- 2.1.1.566.gdb1f904 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/16] isxdigit: cast input to unsigned char
Otherwise, callers must do so or risk triggering -Wchar-subscript (and rightfully so; a signed char might cause us to use a bogus negative index into the hexval_table). While we are dropping the now-unnecessary casts from the caller in urlmatch.c, we can get rid of similar casts in actually parsing the hex by using the hexval() helper, which implicitly casts to unsigned (but note that we cannot implement isxdigit in terms of hexval(), as it also casts its return value to unsigned). Signed-off-by: Jeff King p...@peff.net --- git-compat-util.h | 2 +- urlmatch.c| 8 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index 0c4e663..3d5a8fb 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -677,7 +677,7 @@ extern const unsigned char sane_ctype[256]; #define iscntrl(x) (sane_istest(x,GIT_CNTRL)) #define ispunct(x) sane_istest(x, GIT_PUNCT | GIT_REGEX_SPECIAL | \ GIT_GLOB_SPECIAL | GIT_PATHSPEC_MAGIC) -#define isxdigit(x) (hexval_table[x] != -1) +#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1) #define tolower(x) sane_case((unsigned char)(x), 0x20) #define toupper(x) sane_case((unsigned char)(x), 0) #define is_pathspec_magic(x) sane_istest(x,GIT_PATHSPEC_MAGIC) diff --git a/urlmatch.c b/urlmatch.c index 3d4c54b..618d216 100644 --- a/urlmatch.c +++ b/urlmatch.c @@ -43,11 +43,11 @@ static int append_normalized_escapes(struct strbuf *buf, from_len--; if (ch == '%') { if (from_len 2 || - !isxdigit((unsigned char)from[0]) || - !isxdigit((unsigned char)from[1])) + !isxdigit(from[0]) || + !isxdigit(from[1])) return 0; - ch = hexval_table[(unsigned char)*from++] 4; - ch |= hexval_table[(unsigned char)*from++]; + ch = hexval(*from++) 4; + ch |= hexval(*from++); from_len -= 2; was_esc = 1; } -- 2.1.1.566.gdb1f904 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/16] clean up name allocation in prepare_revision_walk
When we enter prepare_revision_walk, we have zero or more entries in our pending array. We disconnect that array from the rev_info, and then process each entry: 1. If the entry is a commit and the --source option is in effect, we keep a pointer to the object name. 2. Otherwise, we re-add the item to the pending list with a blank name. We then throw away the old array by freeing the array itself, but do not touch the name field of each entry. For any items of type (2), we leak the memory associated with the name. This commit fixes that by calling object_array_clear, which handles the cleanup for us. That breaks (1), though, because it depends on the memory pointed to by the name to last forever. We can solve that by making a copy of the name. This is slightly less efficient, but it shouldn't matter in practice, as we do it only for the tip commits of the traversal. Signed-off-by: Jeff King p...@peff.net --- revision.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/revision.c b/revision.c index e498b7c..01cc276 100644 --- a/revision.c +++ b/revision.c @@ -300,7 +300,7 @@ static struct commit *handle_commit(struct rev_info *revs, revs-limited = 1; } if (revs-show_source !commit-util) - commit-util = (void *) name; + commit-util = xstrdup(name); return commit; } @@ -2656,15 +2656,16 @@ void reset_revision_walk(void) int prepare_revision_walk(struct rev_info *revs) { - int nr = revs-pending.nr; - struct object_array_entry *e, *list; + int i; + struct object_array old_pending; struct commit_list **next = revs-commits; - e = list = revs-pending.objects; + memcpy(old_pending, revs-pending, sizeof(old_pending)); revs-pending.nr = 0; revs-pending.alloc = 0; revs-pending.objects = NULL; - while (--nr = 0) { + for (i = 0; i old_pending.nr; i++) { + struct object_array_entry *e = old_pending.objects + i; struct commit *commit = handle_commit(revs, e-item, e-name); if (commit) { if (!(commit-object.flags SEEN)) { @@ -2672,10 +2673,9 @@ int prepare_revision_walk(struct rev_info *revs) next = commit_list_append(commit, next); } } - e++; } if (!revs-leak_pending) - free(list); + object_array_clear(old_pending); /* Signal whether we need per-parent treesame decoration */ if (revs-simplify_merges || -- 2.1.1.566.gdb1f904 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/16] reachable: clear pending array after walking it
We add a number of objects to our pending array, and then process it with a combination of get_revision and walking the pending array ourselves (to catch any non-commits). The commits in the pending array are cleaned up automatically by prepare_revision_walk, but we essentially leak any other objects (they are technically still reachable from rev_info, but no callers ever look at them or bother to clean them up). This is not a huge deal in practice, as the number of non-commits tends to be small. However, a future patch will broaden this considerably. Let's call object_array_clear to free the memory. Signed-off-by: Jeff King p...@peff.net --- reachable.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/reachable.c b/reachable.c index 6f6835b..d99bd31 100644 --- a/reachable.c +++ b/reachable.c @@ -131,6 +131,8 @@ static void walk_commit_list(struct rev_info *revs, } die(unknown pending object %s (%s), sha1_to_hex(obj-sha1), name); } + + object_array_clear(revs-pending); } static int add_one_reflog_ent(unsigned char *osha1, unsigned char *nsha1, -- 2.1.1.566.gdb1f904 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 19/24] refs.c: allow listing and deleting badly named refs
On Thu, Oct 2, 2014 at 11:55 AM, Junio C Hamano gits...@pobox.com wrote: Jonathan Nieder jrnie...@gmail.com writes: From: Ronnie Sahlberg sahlb...@google.com ... In resolving functions, refuse to resolve refs that don't pass the check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think HEAD and MERGE_HEAD). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg sahlb...@google.com Signed-off-by: Jonathan Nieder jrnie...@gmail.com Thanks. We originally threw all the different kind of breakages into ISBROKEN, but a ref can have a malformed name or can contain a bad/non value and allowing us to tell them apart is a good direction to go. diff --git a/cache.h b/cache.h index 5ca7f2b..0c0ac60 100644 --- a/cache.h +++ b/cache.h @@ -978,16 +978,26 @@ extern int read_ref(const char *refname, unsigned char *sha1); * If flags is non-NULL, set the value that it points to the * combination of REF_ISPACKED (if the reference was found among the * packed references), REF_ISSYMREF (if the initial reference was a - * symbolic reference) and REF_ISBROKEN (if the ref is malformed). + * symbolic reference), REF_BAD_NAME (if the reference name is ill + * formed --- see RESOLVE_REF_ALLOW_BAD_NAME below), and REF_ISBROKEN + * (if the ref is malformed). You want to define is malformed here. The original defines REF_ISBROKEN as malformed because (1) resolve_ref_unsafe() uses get_sha1_hex() and read_loose_refs() uses read_ref_full(), both to catch malformed values stored; (2) resolve_ref_unsafe() uses check_refname_format() and catches malformed names stored as a symref target. I _think_ you are introducing ALLOW_BAD_NAME to allow add the third class .git/refs/remotes/origin/mal..formed..name. I do not know if they should be the same class as a symref with a good name .git/refs/remotes/origin/HEAD that points at a bad name mal..formed..name, which is (2) above). Perhaps not. (2) is still above what is stored in the ref, and the ref in question may or may not have a well-formed name, which is orthogonal. So probably you left only the value stored in the ref is malformed (in other words, we expected to find 40-hex object name but didn't find one) case for REF_ISBROKEN? I updated cache.h to try to clarify it better. The intention here is to expand the use of REF_ISBROKEN. For all cases REF_ISBROKEN will be set. This includes both the sha1 value is bad as well as a name is bad. For those cases where a name is bad then REF_BAD_NAME is also set. Bad names are when either the ref itself has a bad name or when a bad name is encountered while resolving a chain of symbilic refs. I.e. REF_BAD_NAME is a special case of broken ref. All REF_BAD_NAME refs are also REF_ISBROKEN but not the reverse. Do we want to separate value is not 40-hex and a symref points at a malformed refname as separate malformed value errors? + * RESOLVE_REF_ALLOW_BAD_NAME allows resolving refs even when their + * name is invalid according to git-check-ref-format(1). If the name + * is bad then the value stored in sha1 will be null_sha1 and the + * REF_ISBROKEN and REF_BAD_NAME flags will be set. Viewed with that light, I am not sure if a badly named ref should yield null_sha1[] (REF_ISBROKEN, which I am assuming is about a value that is badly formatted and cannot be read, should keep yielding it as before). Wouldn't it make it harder for the user if you give null_sha1[] back to somebody who is trying to recover by reading refs/heads/mal..formed, creating refs/heads/sensible to point at the same value and then removing the former? Note that I am not saying we should give back the parsed value at this step in the series. Perhaps there are some existing callers that do not check for ISBROKEN flag and instead says null_sha1[] ref is to be rejected/ignored, in which case they may need to be corrected before that happens. Or there may be some reason I overlooked that makes it not so useful if we returned the object name stored in a ref whose name is malformed. Just wondering. The reason for these malformed refs resolving to null_sha1 is exactly that. There may be callers that do not check ISBROKEN, so all those callers
[PATCH 07/16] t5304: use test_path_is_* instead of test -f
This is slightly more robust (checking ! test -f would not notice a directory of the same name, though that is not likely to happen here). It also makes debugging easier, as the test script will output a message on failure. Signed-off-by: Jeff King p...@peff.net --- This patch is totally optional. I did it while debugging t5304 (strange how badly prune works when you accidentally invert the mtime check!) and figured it might be worth keeping as a cleanup. t/t5304-prune.sh | 46 +++--- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh index 01c6a3f..b0ffb05 100755 --- a/t/t5304-prune.sh +++ b/t/t5304-prune.sh @@ -14,7 +14,7 @@ add_blob() { BLOB=$(echo aleph_0 | git hash-object -w --stdin) BLOB_FILE=.git/objects/$(echo $BLOB | sed s/^../\//) test $((1 + $before)) = $(git count-objects | sed s/ .*//) - test -f $BLOB_FILE + test_path_is_file $BLOB_FILE test-chmtime =+0 $BLOB_FILE } @@ -35,9 +35,9 @@ test_expect_success 'prune stale packs' ' : .git/objects/tmp_2.pack test-chmtime =-86501 .git/objects/tmp_1.pack git prune --expire 1.day - test -f $orig_pack - test -f .git/objects/tmp_2.pack - ! test -f .git/objects/tmp_1.pack + test_path_is_file $orig_pack + test_path_is_file .git/objects/tmp_2.pack + test_path_is_missing .git/objects/tmp_1.pack ' @@ -46,11 +46,11 @@ test_expect_success 'prune --expire' ' add_blob git prune --expire=1.hour.ago test $((1 + $before)) = $(git count-objects | sed s/ .*//) - test -f $BLOB_FILE + test_path_is_file $BLOB_FILE test-chmtime =-86500 $BLOB_FILE git prune --expire 1.day test $before = $(git count-objects | sed s/ .*//) - ! test -f $BLOB_FILE + test_path_is_missing $BLOB_FILE ' @@ -60,11 +60,11 @@ test_expect_success 'gc: implicit prune --expire' ' test-chmtime =-$((2*$week-30)) $BLOB_FILE git gc test $((1 + $before)) = $(git count-objects | sed s/ .*//) - test -f $BLOB_FILE + test_path_is_file $BLOB_FILE test-chmtime =-$((2*$week+1)) $BLOB_FILE git gc test $before = $(git count-objects | sed s/ .*//) - ! test -f $BLOB_FILE + test_path_is_missing $BLOB_FILE ' @@ -110,7 +110,7 @@ test_expect_success 'prune: do not prune detached HEAD with no reflog' ' git commit --allow-empty -m detached commit # verify that there is no reflogs # (should be removed and disabled by previous test) - test ! -e .git/logs + test_path_is_missing .git/logs git prune -n prune_actual : prune_expected test_cmp prune_actual prune_expected @@ -145,7 +145,7 @@ test_expect_success 'gc --no-prune' ' git config gc.pruneExpire 2.days.ago git gc --no-prune test 1 = $(git count-objects | sed s/ .*//) - test -f $BLOB_FILE + test_path_is_file $BLOB_FILE ' @@ -153,10 +153,10 @@ test_expect_success 'gc respects gc.pruneExpire' ' git config gc.pruneExpire 5002.days.ago git gc - test -f $BLOB_FILE + test_path_is_file $BLOB_FILE git config gc.pruneExpire 5000.days.ago git gc - test ! -f $BLOB_FILE + test_path_is_missing $BLOB_FILE ' @@ -165,9 +165,9 @@ test_expect_success 'gc --prune=date' ' add_blob test-chmtime =-$((5001*$day)) $BLOB_FILE git gc --prune=5002.days.ago - test -f $BLOB_FILE + test_path_is_file $BLOB_FILE git gc --prune=5000.days.ago - test ! -f $BLOB_FILE + test_path_is_missing $BLOB_FILE ' @@ -175,9 +175,9 @@ test_expect_success 'gc --prune=never' ' add_blob git gc --prune=never - test -f $BLOB_FILE + test_path_is_file $BLOB_FILE git gc --prune=now - test ! -f $BLOB_FILE + test_path_is_missing $BLOB_FILE ' @@ -186,10 +186,10 @@ test_expect_success 'gc respects gc.pruneExpire=never' ' git config gc.pruneExpire never add_blob git gc - test -f $BLOB_FILE + test_path_is_file $BLOB_FILE git config gc.pruneExpire now git gc - test ! -f $BLOB_FILE + test_path_is_missing $BLOB_FILE ' @@ -197,9 +197,9 @@ test_expect_success 'prune --expire=never' ' add_blob git prune --expire=never - test -f $BLOB_FILE + test_path_is_file $BLOB_FILE git prune - test ! -f $BLOB_FILE + test_path_is_missing $BLOB_FILE ' @@ -210,10 +210,10 @@ test_expect_success 'gc: prune old objects after local clone' ' ( cd aclone test 1 = $(git count-objects | sed s/ .*//) - test -f $BLOB_FILE + test_path_is_file $BLOB_FILE git gc --prune
[PATCH 08/16] t5304: use helper to report failure of test foo = bar
For small outputs, we sometimes use: test $(some_cmd) = something we expect instead of a full test_cmp. The downside of this is that when it fails, there is no output at all from the script. Let's introduce a small helper to make tests easier to debug. Signed-off-by: Jeff King p...@peff.net --- This is in the same boat as the last commit; we can drop it without hurting the rest of the series. Is test_eq too cutesy or obfuscated? I have often wanted it when debugging other tests, too. Our usual technique is to do: echo whatever expect do_something actual test_cmp expect actual That's a bit verbose. We could hide it behind something like test_eq, too, but it introduces several extra new processes. And I know people on some fork-challenged platforms are very sensitive to the number of spawned processes in the test suite. t/t5304-prune.sh| 16 t/test-lib-functions.sh | 11 +++ 2 files changed, 19 insertions(+), 8 deletions(-) diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh index b0ffb05..502860e 100755 --- a/t/t5304-prune.sh +++ b/t/t5304-prune.sh @@ -13,7 +13,7 @@ add_blob() { before=$(git count-objects | sed s/ .*//) BLOB=$(echo aleph_0 | git hash-object -w --stdin) BLOB_FILE=.git/objects/$(echo $BLOB | sed s/^../\//) - test $((1 + $before)) = $(git count-objects | sed s/ .*//) + test_eq $((1 + $before)) $(git count-objects | sed s/ .*//) test_path_is_file $BLOB_FILE test-chmtime =+0 $BLOB_FILE } @@ -45,11 +45,11 @@ test_expect_success 'prune --expire' ' add_blob git prune --expire=1.hour.ago - test $((1 + $before)) = $(git count-objects | sed s/ .*//) + test_eq $((1 + $before)) $(git count-objects | sed s/ .*//) test_path_is_file $BLOB_FILE test-chmtime =-86500 $BLOB_FILE git prune --expire 1.day - test $before = $(git count-objects | sed s/ .*//) + test_eq $before $(git count-objects | sed s/ .*//) test_path_is_missing $BLOB_FILE ' @@ -59,11 +59,11 @@ test_expect_success 'gc: implicit prune --expire' ' add_blob test-chmtime =-$((2*$week-30)) $BLOB_FILE git gc - test $((1 + $before)) = $(git count-objects | sed s/ .*//) + test_eq $((1 + $before)) $(git count-objects | sed s/ .*//) test_path_is_file $BLOB_FILE test-chmtime =-$((2*$week+1)) $BLOB_FILE git gc - test $before = $(git count-objects | sed s/ .*//) + test_eq $before $(git count-objects | sed s/ .*//) test_path_is_missing $BLOB_FILE ' @@ -144,7 +144,7 @@ test_expect_success 'gc --no-prune' ' test-chmtime =-$((5001*$day)) $BLOB_FILE git config gc.pruneExpire 2.days.ago git gc --no-prune - test 1 = $(git count-objects | sed s/ .*//) + test_eq 1 $(git count-objects | sed s/ .*//) test_path_is_file $BLOB_FILE ' @@ -209,10 +209,10 @@ test_expect_success 'gc: prune old objects after local clone' ' git clone --no-hardlinks . aclone ( cd aclone - test 1 = $(git count-objects | sed s/ .*//) + test_eq 1 $(git count-objects | sed s/ .*//) test_path_is_file $BLOB_FILE git gc --prune - test 0 = $(git count-objects | sed s/ .*//) + test_eq 0 $(git count-objects | sed s/ .*//) test_path_is_missing $BLOB_FILE ) ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index dafd6ad..0a17614 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -634,6 +634,17 @@ test_cmp_bin() { cmp $@ } +# This is the same as 'test $1 $3 $2' except that it +# will output a useful message to stderr on failure. If +# $3 is omitted, defaults to =. +test_eq () { + if ! test $1 ${3:-=} $2 + then + echo 2 test_eq failed: $1 ${3:-=} $2 + false + fi +} + # Check if the file expected to be empty is indeed empty, and barfs # otherwise. -- 2.1.1.566.gdb1f904 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/16] prune: factor out loose-object directory traversal
Prune has to walk $GIT_DIR/objects/?? in order to find the set of loose objects to prune. Other parts of the code (e.g., count-objects) want to do the same. Let's factor it out into a reusable for_each-style function. Note that this is not quite a straight code movement. There are two differences: 1. The original code iterated from 0 to 256, trying to opendir($GIT_DIR/%02x). The new code just does a readdir() on the object directory, and descends into any matching directories. This is faster on already-pruned repositories, and should not ever be slower (nobody ever creates other files in the object directory). 2. The original code had strange behavior when it found a file of the form [0-9a-f]{2}/.{38} that did _not_ contain all hex digits. It executed a break from the loop, meaning that we stopped pruning in that directory (but still pruned other directories!). This was probably a bug; we do not want to process the file as an object, but we should keep going otherwise. Signed-off-by: Jeff King p...@peff.net --- I admit the speedup in (1) almost certainly doesn't matter. It is real, and I found out about it while writing a different program that was basically count-objects across a large number of repositories. However for a single repo it's probably not big enough to matter (calling count-objects in a loop while get dominated by the startup costs). The end result is a little more obvious IMHO, but that's subjective. builtin/prune.c | 87 cache.h | 31 +++ sha1_file.c | 95 + 3 files changed, 152 insertions(+), 61 deletions(-) diff --git a/builtin/prune.c b/builtin/prune.c index 144a3bd..8286680 100644 --- a/builtin/prune.c +++ b/builtin/prune.c @@ -31,11 +31,23 @@ static int prune_tmp_file(const char *fullpath) return 0; } -static int prune_object(const char *fullpath, const unsigned char *sha1) +static int prune_object(const unsigned char *sha1, const char *fullpath, + void *data) { struct stat st; - if (lstat(fullpath, st)) - return error(Could not stat '%s', fullpath); + + /* +* Do we know about this object? +* It must have been reachable +*/ + if (lookup_object(sha1)) + return 0; + + if (lstat(fullpath, st)) { + /* report errors, but do not stop pruning */ + error(Could not stat '%s', fullpath); + return 0; + } if (st.st_mtime expire) return 0; if (show_only || verbose) { @@ -48,68 +60,20 @@ static int prune_object(const char *fullpath, const unsigned char *sha1) return 0; } -static int prune_dir(int i, struct strbuf *path) +static int prune_cruft(const char *basename, const char *path, void *data) { - size_t baselen = path-len; - DIR *dir = opendir(path-buf); - struct dirent *de; - - if (!dir) - return 0; - - while ((de = readdir(dir)) != NULL) { - char name[100]; - unsigned char sha1[20]; - - if (is_dot_or_dotdot(de-d_name)) - continue; - if (strlen(de-d_name) == 38) { - sprintf(name, %02x, i); - memcpy(name+2, de-d_name, 39); - if (get_sha1_hex(name, sha1) 0) - break; - - /* -* Do we know about this object? -* It must have been reachable -*/ - if (lookup_object(sha1)) - continue; - - strbuf_addf(path, /%s, de-d_name); - prune_object(path-buf, sha1); - strbuf_setlen(path, baselen); - continue; - } - if (starts_with(de-d_name, tmp_obj_)) { - strbuf_addf(path, /%s, de-d_name); - prune_tmp_file(path-buf); - strbuf_setlen(path, baselen); - continue; - } - fprintf(stderr, bad sha1 file: %s/%s\n, path-buf, de-d_name); - } - closedir(dir); - if (!show_only) - rmdir(path-buf); + if (starts_with(basename, tmp_obj_)) + prune_tmp_file(path); + else + fprintf(stderr, bad sha1 file: %s\n, path); return 0; } -static void prune_object_dir(const char *path) +static int prune_subdir(const char *basename, const char *path, void *data) { - struct strbuf buf = STRBUF_INIT; - size_t baselen; - int i; - - strbuf_addstr(buf, path); - strbuf_addch(buf, '/'); - baselen = buf.len; - - for (i = 0; i
[PATCH 10/16] count-objects: do not use xsize_t when counting object size
The point of xsize_t is to safely cast an off_t into a size_t (because we are about to mmap). But in count-objects, we are summing the sizes in an off_t. Using xsize_t means that count-objects could fail on a 32-bit system with a 4G object (not likely, as other parts of git would fail, but we should at least be correct here). Signed-off-by: Jeff King p...@peff.net --- I think the on_disk_bytes is a little weird here, too. We count actual disk-usage blocks for loose objects here, which makes sense. But we do _not_ do so for packfiles, or for garbage files. Which seems kind of inconsistent. I kind of doubt anybody cares too much either way, though. builtin/count-objects.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/count-objects.c b/builtin/count-objects.c index a7f70cb..316a805 100644 --- a/builtin/count-objects.c +++ b/builtin/count-objects.c @@ -53,7 +53,7 @@ static void count_objects(DIR *d, char *path, int len, int verbose, if (lstat(path, st) || !S_ISREG(st.st_mode)) bad = 1; else - (*loose_size) += xsize_t(on_disk_bytes(st)); + (*loose_size) += on_disk_bytes(st); } if (bad) { if (verbose) { -- 2.1.1.566.gdb1f904 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/16] count-objects: use for_each_loose_file_in_objdir
This drops our line count considerably, and should make things more readable by keeping the counting logic separate from the traversal. Signed-off-by: Jeff King p...@peff.net --- builtin/count-objects.c | 101 ++-- 1 file changed, 30 insertions(+), 71 deletions(-) diff --git a/builtin/count-objects.c b/builtin/count-objects.c index 316a805..e47ef0b 100644 --- a/builtin/count-objects.c +++ b/builtin/count-objects.c @@ -11,6 +11,9 @@ static unsigned long garbage; static off_t size_garbage; +static int verbose; +static unsigned long loose, packed, packed_loose; +static off_t loose_size; static void real_report_garbage(const char *desc, const char *path) { @@ -21,61 +24,31 @@ static void real_report_garbage(const char *desc, const char *path) garbage++; } -static void count_objects(DIR *d, char *path, int len, int verbose, - unsigned long *loose, - off_t *loose_size, - unsigned long *packed_loose) +static void loose_garbage(const char *path) { - struct dirent *ent; - while ((ent = readdir(d)) != NULL) { - char hex[41]; - unsigned char sha1[20]; - const char *cp; - int bad = 0; + if (verbose) + report_garbage(garbage found, path); +} - if (is_dot_or_dotdot(ent-d_name)) - continue; - for (cp = ent-d_name; *cp; cp++) { - int ch = *cp; - if (('0' = ch ch = '9') || - ('a' = ch ch = 'f')) - continue; - bad = 1; - break; - } - if (cp - ent-d_name != 38) - bad = 1; - else { - struct stat st; - memcpy(path + len + 3, ent-d_name, 38); - path[len + 2] = '/'; - path[len + 41] = 0; - if (lstat(path, st) || !S_ISREG(st.st_mode)) - bad = 1; - else - (*loose_size) += on_disk_bytes(st); - } - if (bad) { - if (verbose) { - struct strbuf sb = STRBUF_INIT; - strbuf_addf(sb, %.*s/%s, - len + 2, path, ent-d_name); - report_garbage(garbage found, sb.buf); - strbuf_release(sb); - } - continue; - } - (*loose)++; - if (!verbose) - continue; - memcpy(hex, path+len, 2); - memcpy(hex+2, ent-d_name, 38); - hex[40] = 0; - if (get_sha1_hex(hex, sha1)) - die(internal error); - if (has_sha1_pack(sha1)) - (*packed_loose)++; +static int count_loose(const unsigned char *sha1, const char *path, void *data) +{ + struct stat st; + + if (lstat(path, st) || !S_ISREG(st.st_mode)) + loose_garbage(path); + else { + loose_size += on_disk_bytes(st); + loose++; + if (verbose has_sha1_pack(sha1)) + packed_loose++; } + return 0; +} + +static int count_cruft(const char *basename, const char *path, void *data) +{ + loose_garbage(path); + return 0; } static char const * const count_objects_usage[] = { @@ -85,12 +58,7 @@ static char const * const count_objects_usage[] = { int cmd_count_objects(int argc, const char **argv, const char *prefix) { - int i, verbose = 0, human_readable = 0; - const char *objdir = get_object_directory(); - int len = strlen(objdir); - char *path = xmalloc(len + 50); - unsigned long loose = 0, packed = 0, packed_loose = 0; - off_t loose_size = 0; + int human_readable = 0; struct option opts[] = { OPT__VERBOSE(verbose, N_(be verbose)), OPT_BOOL('H', human-readable, human_readable, @@ -104,19 +72,10 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix) usage_with_options(count_objects_usage, opts); if (verbose) report_garbage = real_report_garbage; - memcpy(path, objdir, len); - if (len objdir[len-1] != '/') - path[len++] = '/'; - for (i = 0; i 256; i++) { - DIR *d; - sprintf(path + len, %02x, i); - d = opendir(path); - if (!d) - continue; - count_objects(d, path, len, verbose, - loose, loose_size, packed_loose); -
[PATCH 12/16] sha1_file: add for_each iterators for loose and packed objects
We typically iterate over the reachable objects in a repository by starting at the tips and walking the graph. There's no easy way to iterate over all of the objects, including unreachable ones. Let's provide a way of doing so. Signed-off-by: Jeff King p...@peff.net --- cache.h | 11 +++ sha1_file.c | 62 + 2 files changed, 73 insertions(+) diff --git a/cache.h b/cache.h index 7abe7f6..3826b4b 100644 --- a/cache.h +++ b/cache.h @@ -1270,6 +1270,17 @@ int for_each_loose_file_in_objdir(const char *path, each_loose_subdir_fn subdir_cb, void *data); +/* + * Iterate over loose and packed objects in both the local + * repository and any alternates repositories. + */ +typedef int each_packed_object_fn(const unsigned char *sha1, + struct packed_git *pack, + uint32_t pos, + void *data); +extern int for_each_loose_object(each_loose_object_fn, void *); +extern int for_each_packed_object(each_packed_object_fn, void *); + struct object_info { /* Request */ enum object_type *typep; diff --git a/sha1_file.c b/sha1_file.c index 9fdad47..d017289 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3313,3 +3313,65 @@ int for_each_loose_file_in_objdir(const char *path, strbuf_release(buf); return r; } + +struct loose_alt_odb_data { + each_loose_object_fn *cb; + void *data; +}; + +static int loose_from_alt_odb(struct alternate_object_database *alt, + void *vdata) +{ + struct loose_alt_odb_data *data = vdata; + return for_each_loose_file_in_objdir(alt-base, +data-cb, NULL, NULL, +data-data); +} + +int for_each_loose_object(each_loose_object_fn cb, void *data) +{ + struct loose_alt_odb_data alt; + int r; + + r = for_each_loose_file_in_objdir(get_object_directory(), + cb, NULL, NULL, data); + if (r) + return r; + + alt.cb = cb; + alt.data = data; + return foreach_alt_odb(loose_from_alt_odb, alt); +} + +int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data) +{ + uint32_t i; + int r = 0; + + for (i = 0; i p-num_objects; i++) { + const unsigned char *sha1 = nth_packed_object_sha1(p, i); + + if (!sha1) + return error(unable to get sha1 of object %u in %s, +i, p-pack_name); + + r = cb(sha1, p, i, data); + if (r) + break; + } + return r; +} + +int for_each_packed_object(each_packed_object_fn cb, void *data) +{ + struct packed_git *p; + int r = 0; + + prepare_packed_git(); + for (p = packed_git; p; p = p-next) { + r = for_each_object_in_pack(p, cb, data); + if (r) + break; + } + return 0; +} -- 2.1.1.566.gdb1f904 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 19/24] refs.c: allow listing and deleting badly named refs
On Fri, Oct 3, 2014 at 1:25 PM, Ronnie Sahlberg sahlb...@google.com wrote: On Thu, Oct 2, 2014 at 11:55 AM, Junio C Hamano gits...@pobox.com wrote: Jonathan Nieder jrnie...@gmail.com writes: From: Ronnie Sahlberg sahlb...@google.com ... In resolving functions, refuse to resolve refs that don't pass the check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think HEAD and MERGE_HEAD). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg sahlb...@google.com Signed-off-by: Jonathan Nieder jrnie...@gmail.com Thanks. We originally threw all the different kind of breakages into ISBROKEN, but a ref can have a malformed name or can contain a bad/non value and allowing us to tell them apart is a good direction to go. diff --git a/cache.h b/cache.h index 5ca7f2b..0c0ac60 100644 --- a/cache.h +++ b/cache.h @@ -978,16 +978,26 @@ extern int read_ref(const char *refname, unsigned char *sha1); * If flags is non-NULL, set the value that it points to the * combination of REF_ISPACKED (if the reference was found among the * packed references), REF_ISSYMREF (if the initial reference was a - * symbolic reference) and REF_ISBROKEN (if the ref is malformed). + * symbolic reference), REF_BAD_NAME (if the reference name is ill + * formed --- see RESOLVE_REF_ALLOW_BAD_NAME below), and REF_ISBROKEN + * (if the ref is malformed). You want to define is malformed here. The original defines REF_ISBROKEN as malformed because (1) resolve_ref_unsafe() uses get_sha1_hex() and read_loose_refs() uses read_ref_full(), both to catch malformed values stored; (2) resolve_ref_unsafe() uses check_refname_format() and catches malformed names stored as a symref target. I _think_ you are introducing ALLOW_BAD_NAME to allow add the third class .git/refs/remotes/origin/mal..formed..name. I do not know if they should be the same class as a symref with a good name .git/refs/remotes/origin/HEAD that points at a bad name mal..formed..name, which is (2) above). Perhaps not. (2) is still above what is stored in the ref, and the ref in question may or may not have a well-formed name, which is orthogonal. So probably you left only the value stored in the ref is malformed (in other words, we expected to find 40-hex object name but didn't find one) case for REF_ISBROKEN? I updated cache.h to try to clarify it better. The intention here is to expand the use of REF_ISBROKEN. For all cases REF_ISBROKEN will be set. This includes both the sha1 value is bad as well as a name is bad. For those cases where a name is bad then REF_BAD_NAME is also set. Bad names are when either the ref itself has a bad name or when a bad name is encountered while resolving a chain of symbilic refs. I.e. REF_BAD_NAME is a special case of broken ref. All REF_BAD_NAME refs are also REF_ISBROKEN but not the reverse. Let me add some rationale why REF_BAD_NAME and REF_ISBROKEN are notorthogonal properties. I think almost all callers of these APIs are only concerned about is the ref good or bad and they today only check REF_ISBROKEN. I think that is a reasonable API and it allows the majority of callers to just check this single flag. (The alternative would be to keep all callers in sync and use a set of flags for all bad conditions.) A very small subset of callers are actually interested in knowing why the ref was bad, and in particular if the ref was bad due to a name component. Those callers, that are aware that there are different types of ISBROKEN can then inspect the REF_BAD_NAME flag in order to decide is the ref broken due to the ref name?. This is why I made REF_BAD_NAME a special case of REF_ISBROKEN. Do we want to separate value is not 40-hex and a symref points at a malformed refname as separate malformed value errors? + * RESOLVE_REF_ALLOW_BAD_NAME allows resolving refs even when their + * name is invalid according to git-check-ref-format(1). If the name + * is bad then the value stored in sha1 will be null_sha1 and the + * REF_ISBROKEN and REF_BAD_NAME flags will be set. Viewed with that light, I am not sure if a badly named ref should yield null_sha1[] (REF_ISBROKEN, which I am assuming is about a value that is badly
[PATCH 15/16] pack-objects: match prune logic for discarding objects
A recent commit taught git-prune to keep non-recent objects that are reachable from recent ones. However, pack-objects, when loosening unreachable objects, tries to optimize out the write in the case that the object will be immediately pruned. It now gets this wrong, since its rule does not reflect the new prune code (and this can be seen by running t6501 with a strategically placed repack). Let's teach pack-objects similar logic. Signed-off-by: Jeff King p...@peff.net --- The test changes look big because of the indentation. View with -w for a sane diff. builtin/pack-objects.c | 38 +++ reachable.c| 4 +- reachable.h| 2 + t/t6501-freshen-objects.sh | 93 +++--- 4 files changed, 97 insertions(+), 40 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 2fe2ab0..1db5ea5 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -20,6 +20,8 @@ #include streaming.h #include thread-utils.h #include pack-bitmap.h +#include reachable.h +#include sha1-array.h static const char *pack_usage[] = { N_(git pack-objects --stdout [options...] [ ref-list | object-list]), @@ -2407,6 +2409,15 @@ static int has_sha1_pack_kept_or_nonlocal(const unsigned char *sha1) return 0; } +/* + * Store a list of sha1s that are should not be discarded + * because they are either written too recently, or are + * reachable from another object that was. + * + * This is filled by get_object_list. + */ +static struct sha1_array recent_objects; + static int loosened_object_can_be_discarded(const unsigned char *sha1, unsigned long mtime) { @@ -2414,6 +2425,8 @@ static int loosened_object_can_be_discarded(const unsigned char *sha1, return 0; if (mtime unpack_unreachable_expiration) return 0; + if (sha1_array_lookup(recent_objects, sha1) = 0) + return 0; return 1; } @@ -2470,6 +2483,19 @@ static int get_object_list_from_bitmap(struct rev_info *revs) return 0; } +static void record_recent_object(struct object *obj, +const struct name_path *path, +const char *last, +void *data) +{ + sha1_array_append(recent_objects, obj-sha1); +} + +static void record_recent_commit(struct commit *commit, void *data) +{ + sha1_array_append(recent_objects, commit-object.sha1); +} + static void get_object_list(int ac, const char **av) { struct rev_info revs; @@ -2517,10 +2543,22 @@ static void get_object_list(int ac, const char **av) mark_edges_uninteresting(revs, show_edge); traverse_commit_list(revs, show_commit, show_object, NULL); + if (unpack_unreachable_expiration) { + if (add_unseen_recent_objects_to_traversal(revs, + unpack_unreachable_expiration)) + die(unable to add recent objects); + if (prepare_revision_walk(revs)) + die(revision walk setup failed); + traverse_commit_list(revs, record_recent_commit, +record_recent_object, NULL); + } + if (keep_unreachable) add_objects_in_unpacked_packs(revs); if (unpack_unreachable) loosen_unused_packed_objects(revs); + + sha1_array_clear(recent_objects); } static int option_parse_index_version(const struct option *opt, diff --git a/reachable.c b/reachable.c index f443265..5e35f21 100644 --- a/reachable.c +++ b/reachable.c @@ -298,8 +298,8 @@ static int add_recent_packed(const unsigned char *sha1, return 0; } -static int add_unseen_recent_objects_to_traversal(struct rev_info *revs, - unsigned long timestamp) +int add_unseen_recent_objects_to_traversal(struct rev_info *revs, + unsigned long timestamp) { struct recent_data data; int r; diff --git a/reachable.h b/reachable.h index 141fe30..d23efc3 100644 --- a/reachable.h +++ b/reachable.h @@ -2,6 +2,8 @@ #define REACHEABLE_H struct progress; +extern int add_unseen_recent_objects_to_traversal(struct rev_info *revs, + unsigned long timestamp); extern void mark_reachable_objects(struct rev_info *revs, int mark_reflog, unsigned long mark_recent, struct progress *); diff --git a/t/t6501-freshen-objects.sh b/t/t6501-freshen-objects.sh index de941c2..e25c47d 100755 --- a/t/t6501-freshen-objects.sh +++ b/t/t6501-freshen-objects.sh @@ -39,50 +39,67 @@ commit () { git commit -m $1 } -test_expect_success 'disable reflogs' ' - git config core.logallrefupdates false - rm -rf .git/logs -' +maybe_repack () { + if
[PATCH 14/16] pack-objects: refactor unpack-unreachable expiration check
When we are loosening unreachable packed objects, we do not bother to process objects that would simply be pruned immediately anyway. The would be pruned check is a simple comparison, but is about to get more complicated. Let's pull it out into a separate function. Note that this is slightly less efficient than the original, which avoided even opening old packs, since no object in them could pass the current check, which cares only about the pack mtime. But the new rules will depend on the exact object, so we need to perform the check even for old packs. Note also that we fix a minor buglet when the pack mtime is exactly the same as the expiration time. The prune code considers that worth pruning, whereas our check here considered it worth keeping. This wasn't a big deal. Besides being unlikely to happen, the result was simply that the object was loosened and then pruned, missing the optimization. Still, we can easily fix it while we are here. Signed-off-by: Jeff King p...@peff.net --- builtin/pack-objects.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index d391934..2fe2ab0 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2407,6 +2407,16 @@ static int has_sha1_pack_kept_or_nonlocal(const unsigned char *sha1) return 0; } +static int loosened_object_can_be_discarded(const unsigned char *sha1, + unsigned long mtime) +{ + if (!unpack_unreachable_expiration) + return 0; + if (mtime unpack_unreachable_expiration) + return 0; + return 1; +} + static void loosen_unused_packed_objects(struct rev_info *revs) { struct packed_git *p; @@ -2417,17 +2427,14 @@ static void loosen_unused_packed_objects(struct rev_info *revs) if (!p-pack_local || p-pack_keep) continue; - if (unpack_unreachable_expiration - p-mtime unpack_unreachable_expiration) - continue; - if (open_pack_index(p)) die(cannot open pack index); for (i = 0; i p-num_objects; i++) { sha1 = nth_packed_object_sha1(p, i); if (!packlist_find(to_pack, sha1, NULL) - !has_sha1_pack_kept_or_nonlocal(sha1)) + !has_sha1_pack_kept_or_nonlocal(sha1) + !loosened_object_can_be_discarded(sha1, p-mtime)) if (force_object_loose(sha1, p-mtime)) die(unable to force loose object); } -- 2.1.1.566.gdb1f904 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 19/24] refs.c: allow listing and deleting badly named refs
Ronnie Sahlberg sahlb...@google.com writes: On Thu, Oct 2, 2014 at 11:55 AM, Junio C Hamano gits...@pobox.com wrote: ... Thanks. We originally threw all the different kind of breakages into ISBROKEN, but a ref can have a malformed name or can contain a bad/non value and allowing us to tell them apart is a good direction to go. ... I updated cache.h to try to clarify it better. The intention here is to expand the use of REF_ISBROKEN. For all cases REF_ISBROKEN will be set. This includes both the sha1 value is bad as well as a name is bad. OK. As long as it is documented in a way to help other people who touch the code later to tell what these REF_* mean, either way is fine between BADNAME is just one of the possible ways a ref is ISBROKEN and BADNAME is a way and ISBROKEN is another way for a ref to be bad, and I agree that the former (i.e. the way you defined) is easier on the existing code. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/16] prune: keep objects reachable from recent objects
Our current strategy with prune is that an object falls into one of three categories: 1. Reachable (from ref tips, reflogs, index, etc). 2. Not reachable, but recent (based on the --expire time and the file's mtime). 3. Not reachable and not recent. We keep objects from (1) and (2), but prune objects in (3). The point of (2) is that these objects may be part of an in-progress operation that has not yet updated any refs. However, it is not always the case that objects for an in-progress operation will have a recent mtime. For example, the object database may have an old copy of a blob (from an abandoned operation, a branch that was deleted, etc). If we create a new tree that points to it, a simultaneous prune will leave our tree, but delete the blob. Referencing that tree with a commit will then work (we check that the tree is in the object database, but not that all of its referred objects are), as will mentioning the commit in a ref. But the resulting repo is corrupt; we are missing the blob reachable from a ref. One way to solve this is to be more thorough when referencing a sha1: make sure that not only do we have that sha1, but that we have the objects it refers to, and so forth recursively. The problem is that this is very expensive. Creating a parent link would require traversing the entire object graph down to the roots. Instead, this patch pushes the extra work onto prune, which runs less frequently (and has to look at the whole object graph anyway). It creates a new category of objects: objects which are not recent, but which are reachable from a recent object. We do not prune these objects, just like the reachable and recent ones. This lets us avoid the recursive check above, because if we have an object, even if it is unreachable, we should have its referent: - if we are creating new objects, then we cannot create the parent object without having the child - and if we are pruning objects, will not prune the child if we are keeping the parent The big exception would be if one were to write the object in a way that avoided referential integrity (e.g., using hash-object). But if you are in the habit of doing that, you deserve what you get. Naively, the simplest way to implement this would be to add all recent objects as tips to the reachability traversal. However, this does not perform well. In a recently-packed repository, all reachable objects will also be recent, and therefore we have to consider each object twice (both as a tip, and when we reach it in the traversal). I tested this, and it added about 10s to a 30s prune on linux.git. This patch instead performs the normal reachability traversal first, then follows up with a second traversal for recent objects, skipping any that have already been marked. Signed-off-by: Jeff King p...@peff.net --- I put the mark-recent code into mark_reachable_objects here, but it does not technically have to be there. It reuses the same rev_info object (which is convenient), but the SEEN flags from the first traversal are marked on the global commit objects themselves. So we could break it out into a separate function. However, we'd have to refactor the progress reporting; the numbers are kept internally to mark_reachable, and we would want to continue them for the second traversal (though I suppose you could start a second progress meter with Checking recent objects or something if you wanted). builtin/prune.c| 2 +- builtin/reflog.c | 2 +- reachable.c| 111 + reachable.h| 3 +- t/t6501-freshen-objects.sh | 88 +++ 5 files changed, 203 insertions(+), 3 deletions(-) create mode 100755 t/t6501-freshen-objects.sh diff --git a/builtin/prune.c b/builtin/prune.c index 8286680..a965574 100644 --- a/builtin/prune.c +++ b/builtin/prune.c @@ -135,7 +135,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix) if (show_progress) progress = start_progress_delay(_(Checking connectivity), 0, 0, 2); - mark_reachable_objects(revs, 1, progress); + mark_reachable_objects(revs, 1, expire, progress); stop_progress(progress); for_each_loose_file_in_objdir(get_object_directory(), prune_object, prune_cruft, prune_subdir, NULL); diff --git a/builtin/reflog.c b/builtin/reflog.c index e8a8fb1..80bddc2 100644 --- a/builtin/reflog.c +++ b/builtin/reflog.c @@ -649,7 +649,7 @@ static int cmd_reflog_expire(int argc, const char **argv, const char *prefix) init_revisions(cb.revs, prefix); if (cb.verbose) printf(Marking reachable objects...); - mark_reachable_objects(cb.revs, 0, NULL); + mark_reachable_objects(cb.revs, 0, 0, NULL); if (cb.verbose) putchar('\n'); } diff --git
[PATCH 16/16] write_sha1_file: freshen existing objects
When we try to write a loose object file, we first check whether that object already exists. If so, we skip the write as an optimization. However, this can interfere with prune's strategy of using mtimes to mark files in progress. For example, if a branch contains a particular tree object and is deleted, that tree object may become unreachable, and have an old mtime. If a new operation then tries to write the same tree, this ends up as a noop; we notice we already have the object and do nothing. A prune running simultaneously with this operation will see the object as old, and may delete it. We can solve this by freshening objects that we avoid writing by updating their mtime. The algorithm for doing so is essentially the same as that of has_sha1_file. Therefore we provide a new (static) interface check_and_freshen, which finds and optionally freshens the object. It's trivial to implement freshening and simple checking by tweaking a single parameter. Signed-off-by: Jeff King p...@peff.net --- sha1_file.c| 51 +++--- t/t6501-freshen-objects.sh | 27 2 files changed, 71 insertions(+), 7 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index d017289..d263b38 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -442,27 +442,53 @@ void prepare_alt_odb(void) read_info_alternates(get_object_directory(), 0); } -static int has_loose_object_local(const unsigned char *sha1) +static int freshen_file(const char *fn) { - return !access(sha1_file_name(sha1), F_OK); + struct utimbuf t; + t.actime = t.modtime = time(NULL); + return utime(fn, t); } -int has_loose_object_nonlocal(const unsigned char *sha1) +static int check_and_freshen_file(const char *fn, int freshen) +{ + if (access(fn, F_OK)) + return 0; + if (freshen freshen_file(fn)) + return 0; + return 1; +} + +static int check_and_freshen_local(const unsigned char *sha1, int freshen) +{ + return check_and_freshen_file(sha1_file_name(sha1), freshen); +} + +static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen) { struct alternate_object_database *alt; prepare_alt_odb(); for (alt = alt_odb_list; alt; alt = alt-next) { fill_sha1_path(alt-name, sha1); - if (!access(alt-base, F_OK)) + if (check_and_freshen_file(alt-base, freshen)) return 1; } return 0; } +static int check_and_freshen(const unsigned char *sha1, int freshen) +{ + return check_and_freshen_local(sha1, freshen) || + check_and_freshen_nonlocal(sha1, freshen); +} + +int has_loose_object_nonlocal(const unsigned char *sha1) +{ + return check_and_freshen_nonlocal(sha1, 0); +} + static int has_loose_object(const unsigned char *sha1) { - return has_loose_object_local(sha1) || - has_loose_object_nonlocal(sha1); + return check_and_freshen(sha1, 0); } static unsigned int pack_used_ctr; @@ -2949,6 +2975,17 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen, return move_temp_to_file(tmp_file, filename); } +static int freshen_loose_object(const unsigned char *sha1) +{ + return check_and_freshen(sha1, 1); +} + +static int freshen_packed_object(const unsigned char *sha1) +{ + struct pack_entry e; + return find_pack_entry(sha1, e) freshen_file(e.p-pack_name); +} + int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1) { unsigned char sha1[20]; @@ -2961,7 +2998,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign write_sha1_file_prepare(buf, len, type, sha1, hdr, hdrlen); if (returnsha1) hashcpy(returnsha1, sha1); - if (has_sha1_file(sha1)) + if (freshen_loose_object(sha1) || freshen_packed_object(sha1)) return 0; return write_loose_object(sha1, hdr, hdrlen, buf, len, 0); } diff --git a/t/t6501-freshen-objects.sh b/t/t6501-freshen-objects.sh index e25c47d..157f3f9 100755 --- a/t/t6501-freshen-objects.sh +++ b/t/t6501-freshen-objects.sh @@ -100,6 +100,33 @@ for repack in '' true; do test_expect_success repository passes fsck ($title) ' git fsck ' + + test_expect_success abandon objects again ($title) ' + git reset --hard HEAD^ + find .git/objects -type f | + xargs test-chmtime -v -86400 + ' + + test_expect_success start writing new commit with same tree ($title) ' + tree=$( + GIT_INDEX_FILE=index.tmp + export GIT_INDEX_FILE + git read-tree HEAD + add abandon + add unrelated + git write-tree + ) +
Re: [PATCH 16/16] write_sha1_file: freshen existing objects
Jeff King p...@peff.net writes: When we try to write a loose object file, we first check whether that object already exists. If so, we skip the write as an optimization. However, this can interfere with prune's strategy of using mtimes to mark files in progress. For example, if a branch contains a particular tree object and is deleted, that tree object may become unreachable, and have an old mtime. If a new operation then tries to write the same tree, this ends up as a noop; we notice we already have the object and do nothing. A prune running simultaneously with this operation will see the object as old, and may delete it. We can solve this by freshening objects that we avoid writing by updating their mtime. The algorithm for doing so is essentially the same as that of has_sha1_file. Therefore we provide a new (static) interface check_and_freshen, which finds and optionally freshens the object. It's trivial to implement freshening and simple checking by tweaking a single parameter. An old referent by a recent unreachable may be in pack. Is it expected that the same pack will have many similar old objects (in other words, is it worth trying to optimize check-and-freshen by bypassing access() and utime(), perhaps by keeping a freshened in this process already flag in struct packed_git)? Could check-and-freshen-nonlocal() ever be called with freshen set to true? Should it be? In other words, should we be mucking with objects in other people's repositories with utime()? Other than these two minor questions, I am happy with this patch. Thanks. Signed-off-by: Jeff King p...@peff.net --- sha1_file.c| 51 +++--- t/t6501-freshen-objects.sh | 27 2 files changed, 71 insertions(+), 7 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index d017289..d263b38 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -442,27 +442,53 @@ void prepare_alt_odb(void) read_info_alternates(get_object_directory(), 0); } -static int has_loose_object_local(const unsigned char *sha1) +static int freshen_file(const char *fn) { - return !access(sha1_file_name(sha1), F_OK); + struct utimbuf t; + t.actime = t.modtime = time(NULL); + return utime(fn, t); } -int has_loose_object_nonlocal(const unsigned char *sha1) +static int check_and_freshen_file(const char *fn, int freshen) +{ + if (access(fn, F_OK)) + return 0; + if (freshen freshen_file(fn)) + return 0; + return 1; +} + +static int check_and_freshen_local(const unsigned char *sha1, int freshen) +{ + return check_and_freshen_file(sha1_file_name(sha1), freshen); +} + +static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen) { struct alternate_object_database *alt; prepare_alt_odb(); for (alt = alt_odb_list; alt; alt = alt-next) { fill_sha1_path(alt-name, sha1); - if (!access(alt-base, F_OK)) + if (check_and_freshen_file(alt-base, freshen)) return 1; } return 0; } +static int check_and_freshen(const unsigned char *sha1, int freshen) +{ + return check_and_freshen_local(sha1, freshen) || +check_and_freshen_nonlocal(sha1, freshen); +} + +int has_loose_object_nonlocal(const unsigned char *sha1) +{ + return check_and_freshen_nonlocal(sha1, 0); +} + static int has_loose_object(const unsigned char *sha1) { - return has_loose_object_local(sha1) || -has_loose_object_nonlocal(sha1); + return check_and_freshen(sha1, 0); } static unsigned int pack_used_ctr; @@ -2949,6 +2975,17 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen, return move_temp_to_file(tmp_file, filename); } +static int freshen_loose_object(const unsigned char *sha1) +{ + return check_and_freshen(sha1, 1); +} + +static int freshen_packed_object(const unsigned char *sha1) +{ + struct pack_entry e; + return find_pack_entry(sha1, e) freshen_file(e.p-pack_name); +} + int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1) { unsigned char sha1[20]; @@ -2961,7 +2998,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign write_sha1_file_prepare(buf, len, type, sha1, hdr, hdrlen); if (returnsha1) hashcpy(returnsha1, sha1); - if (has_sha1_file(sha1)) + if (freshen_loose_object(sha1) || freshen_packed_object(sha1)) return 0; return write_loose_object(sha1, hdr, hdrlen, buf, len, 0); } diff --git a/t/t6501-freshen-objects.sh b/t/t6501-freshen-objects.sh index e25c47d..157f3f9 100755 --- a/t/t6501-freshen-objects.sh +++ b/t/t6501-freshen-objects.sh @@ -100,6 +100,33 @@ for repack in '' true; do test_expect_success
[PATCH v22.5 09/24] refs.c: pass a list of names to skip to is_refname_available
From: Ronnie Sahlberg sahlb...@google.com Date: Thu, 1 May 2014 11:16:07 -0700 commit 23acf975d74825789112a3a7ba97dbbdc76904f4 upstream. Change is_refname_available to take a list of strings to exclude when checking for conflicts instead of just one single name. We can already exclude a single name for the sake of renames. This generalizes that support. ref_transaction_commit already tracks a set of refs that are being deleted in an array. This array is then used to exclude refs from being written to the packed-refs file. At some stage we will want to change this array to a struct string_list and then we can pass it to is_refname_available via the call to lock_ref_sha1_basic. That will allow us to perform transactions that perform multiple renames as long as there are no conflicts within the starting or ending state. For example, that would allow a single transaction that contains two renames that are both individually conflicting: m - n/n n - m/m No functional change intended yet. Change-Id: I8c68f0a47eef25759d572436ba683cb7b1ccb7eb Signed-off-by: Ronnie Sahlberg sahlb...@google.com Signed-off-by: Jonathan Nieder jrnie...@gmail.com --- Junio C Hamano wrote: Jonathan Nieder jrnie...@gmail.com writes: Thanks for the heads up. (I hadn't realized the ref-transaction-1 was part of 'master' --- yay!) Rebased and reworked: https://code-review.googlesource.com/1027 Heh, I was sort of expecting that heavy-weight contributors that throw many and/or large patch series would at least be paying attention to What's cooking reports. That's fair. I don't pay enough attention to the 'graduated' section. Will take a look over there (I really hate having to context switch only to review this series, though). Here's a copy to save a trip. refs.c | 50 -- 1 file changed, 32 insertions(+), 18 deletions(-) diff --git a/refs.c b/refs.c index 83b2c77..05e42a7 100644 --- a/refs.c +++ b/refs.c @@ -785,13 +785,13 @@ static void prime_ref_dir(struct ref_dir *dir) } } -static int entry_matches(struct ref_entry *entry, const char *refname) +static int entry_matches(struct ref_entry *entry, const struct string_list *list) { - return refname !strcmp(entry-name, refname); + return list string_list_has_string(list, entry-name); } struct nonmatching_ref_data { - const char *skip; + const struct string_list *skip; struct ref_entry *found; }; @@ -815,16 +815,19 @@ static void report_refname_conflict(struct ref_entry *entry, /* * Return true iff a reference named refname could be created without * conflicting with the name of an existing reference in dir. If - * oldrefname is non-NULL, ignore potential conflicts with oldrefname - * (e.g., because oldrefname is scheduled for deletion in the same + * skip is non-NULL, ignore potential conflicts with refs in skip + * (e.g., because they are scheduled for deletion in the same * operation). * * Two reference names conflict if one of them exactly matches the * leading components of the other; e.g., foo/bar conflicts with * both foo and with foo/bar/baz but not with foo/bar or * foo/barbados. + * + * skip must be sorted. */ -static int is_refname_available(const char *refname, const char *oldrefname, +static int is_refname_available(const char *refname, + const struct string_list *skip, struct ref_dir *dir) { const char *slash; @@ -838,12 +841,12 @@ static int is_refname_available(const char *refname, const char *oldrefname, * looking for a conflict with a leaf entry. * * If we find one, we still must make sure it is -* not oldrefname. +* not in skip. */ pos = search_ref_dir(dir, refname, slash - refname); if (pos = 0) { struct ref_entry *entry = dir-entries[pos]; - if (entry_matches(entry, oldrefname)) + if (entry_matches(entry, skip)) return 1; report_refname_conflict(entry, refname); return 0; @@ -876,13 +879,13 @@ static int is_refname_available(const char *refname, const char *oldrefname, /* * We found a directory named refname. It is a * problem iff it contains any ref that is not -* oldrefname. +* in skip. */ struct ref_entry *entry = dir-entries[pos]; struct ref_dir *dir = get_ref_dir(entry); struct nonmatching_ref_data data; - data.skip = oldrefname; + data.skip = skip; sort_ref_dir(dir); if (!do_for_each_entry_in_dir(dir, 0, nonmatching_ref_fn, data)) return
Re: [PATCH 13/16] prune: keep objects reachable from recent objects
Jeff King p...@peff.net writes: Instead, this patch pushes the extra work onto prune, which runs less frequently (and has to look at the whole object graph anyway). It creates a new category of objects: objects which are not recent, but which are reachable from a recent object. We do not prune these objects, just like the reachable and recent ones. This lets us avoid the recursive check above, because if we have an object, even if it is unreachable, we should have its referent: - if we are creating new objects, then we cannot create the parent object without having the child - and if we are pruning objects, will not prune the child if we are keeping the parent Sorry but this part is beyond a simple panda brain. I can understand this If we have an object, even if it is unreachable, we should have its referent. as a description of the desired behaviour. If we have a tree that is unreachable, we must make sure that we do not discard blobs that are reachable from that tree, or we would end up corrupting our repository if we ever allow that tree to become reachable from our refs later. But how does that connect to these two bullet points? - if we are creating new objects, then we cannot create the parent object without having the child We cannot create the parent (e.g. tree) without having the child (e.g. blob that is referred to by the tree we are creating). So this bullet point is repeating the same thing? - and if we are pruning objects, will not prune the child if we are keeping the parent We will not prune blob that are reachable from a tree that we are not yet ready to prune. So this again is repeating the same thing? But these are this is how we want our system to behave. And if we assume our system behaves like so, then prune would be safe. But it is unclear how that behaviour is realized. Puzzled... ... goes and thinks ... With this patch applied, the system will not prune unreachable old objects that are reachable from a recent object (the recent object itself may or may not be reachable but that does not make any difference). And that is sufficient to ensure the integrity of the repository even if you allow new objects to be created reusing any of these unreachable objects that are left behind by prune, because the reachability check done during prune (with this patch applied) makes sure any object left in the repository can safely be used as a starting point of connectivity traversal. Ok, I think I got it now, but then do we still need to utime(2) the loose object files for unreachable objects that are referenced by a recent object (which is done in a later patch), or is that purely an optimization for the next round of gc where you would have more recent objects (i.e. you do not have to traverse to find out an old one is reachable from a new one, as there will be fewer old ones)? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/16] t5304: use test_path_is_* instead of test -f
Jeff King p...@peff.net writes: This is slightly more robust (checking ! test -f would not notice a directory of the same name, though that is not likely to happen here). It also makes debugging easier, as the test script will output a message on failure. Signed-off-by: Jeff King p...@peff.net --- This patch is totally optional. I did it while debugging t5304 (strange how badly prune works when you accidentally invert the mtime check!) and figured it might be worth keeping as a cleanup. Looks sensible; thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/16] t5304: use helper to report failure of test foo = bar
Jeff King p...@peff.net writes: For small outputs, we sometimes use: test $(some_cmd) = something we expect instead of a full test_cmp. The downside of this is that when it fails, there is no output at all from the script. Let's introduce a small helper to make tests easier to debug. Signed-off-by: Jeff King p...@peff.net --- This is in the same boat as the last commit; we can drop it without hurting the rest of the series. Is test_eq too cutesy or obfuscated? Not really. As long as its two strings are not to long, the output may still be readable. I have often wanted it when debugging other tests, too. Our usual technique is to do: echo whatever expect do_something actual test_cmp expect actual That's a bit verbose. We could hide it behind something like test_eq, too, but it introduces several extra new processes. What do you mean by extra new processes? Whether open coded in a verbose way, or wrapped inside a helper, e.g. test_eql () { echo $1 expect shift $@ actual test_cmp expect actual } ... test_eql whatever do_something the number of processes would be the same, no? Or do you mean test_cmp is an extra process compared with test_eq whatever $(do_something) Hopefully, do_something does something more than what takes test_cmp to run, so I wouldn't be worried too much about it. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] prune: factor out loose-object directory traversal
Jeff King p...@peff.net writes: Prune has to walk $GIT_DIR/objects/?? in order to find the set of loose objects to prune. Other parts of the code (e.g., count-objects) want to do the same. Let's factor it out into a reusable for_each-style function. Doesn't fsck also look at these as well? I recall that we also have a sort by inum trick employed there. Would it also be applicable to these two callers? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16] make prune mtime-checking more careful
Jeff King p...@peff.net writes: ... The objects are removed by prune, which doesn't realize that they are part of an ongoing operation. Prune uses the filesystem mtime to determine this, but we are not very thorough in making sure that is kept up to date. The whole series looked quite sensible. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bundle: plug minor memory leak in is_tag_in_date_range()
Free the buffer returned by read_sha1_file() even if no valid tagger line is found. Signed-off-by: Rene Scharfe l@web.de --- bundle.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/bundle.c b/bundle.c index b2b89fe..9ed865c 100644 --- a/bundle.c +++ b/bundle.c @@ -211,24 +211,28 @@ static int is_tag_in_date_range(struct object *tag, struct rev_info *revs) enum object_type type; char *buf, *line, *lineend; unsigned long date; + int result = 1; if (revs-max_age == -1 revs-min_age == -1) - return 1; + goto out; buf = read_sha1_file(tag-sha1, type, size); if (!buf) - return 1; + goto out; line = memmem(buf, size, \ntagger , 8); if (!line++) - return 1; + goto out_free; lineend = memchr(line, '\n', buf + size - line); line = memchr(line, '', lineend ? lineend - line : buf + size - line); if (!line++) - return 1; + goto out_free; date = strtoul(line, NULL, 10); - free(buf); - return (revs-max_age == -1 || revs-max_age date) + result = (revs-max_age == -1 || revs-max_age date) (revs-min_age == -1 || revs-min_age date); +out_free: + free(buf); +out: + return result; } int create_bundle(struct bundle_header *header, const char *path, -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/16] foreach_alt_odb: propagate return value from callback
Am 03.10.2014 um 22:21 schrieb Jeff King: We check the return value of the callback and stop iterating if it is non-zero. However, we do not make the non-zero return value available to the caller, so they have no way of knowing whether the operation succeeded or not (technically they can keep their own error flag in the callback data, but that is unlike our other for_each functions). Signed-off-by: Jeff King p...@peff.net --- cache.h | 2 +- sha1_file.c | 12 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/cache.h b/cache.h index 8206039..cd16e25 100644 --- a/cache.h +++ b/cache.h @@ -1143,7 +1143,7 @@ extern void prepare_alt_odb(void); extern void read_info_alternates(const char * relative_base, int depth); extern void add_to_alternates_file(const char *reference); typedef int alt_odb_fn(struct alternate_object_database *, void *); -extern void foreach_alt_odb(alt_odb_fn, void*); +extern int foreach_alt_odb(alt_odb_fn, void*); struct pack_window { struct pack_window *next; diff --git a/sha1_file.c b/sha1_file.c index c08c0cb..bae1c15 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -412,14 +412,18 @@ void add_to_alternates_file(const char *reference) link_alt_odb_entries(alt, strlen(alt), '\n', NULL, 0); } -void foreach_alt_odb(alt_odb_fn fn, void *cb) +int foreach_alt_odb(alt_odb_fn fn, void *cb) { struct alternate_object_database *ent; + int r = 0; prepare_alt_odb(); - for (ent = alt_odb_list; ent; ent = ent-next) - if (fn(ent, cb)) - return; + for (ent = alt_odb_list; ent; ent = ent-next) { + int r = fn(ent, cb); + if (r) + break; + } + return r; This will always return zero. You probably shadowed r unintentionally inside the loop, right? } void prepare_alt_odb(void) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bundle: plug minor memory leak in is_tag_in_date_range()
René Scharfe wrote: --- a/bundle.c +++ b/bundle.c @@ -211,24 +211,28 @@ static int is_tag_in_date_range(struct object *tag, struct rev_info *revs) enum object_type type; char *buf, *line, *lineend; If buf is initialized to NULL, there is no need for separate out and out_free labels. With or without such a change, Reviewed-by: Jonathan Nieder jrnie...@gmail.com diff --git i/bundle.c w/bundle.c index 4158e11..101cde0 100644 --- i/bundle.c +++ w/bundle.c @@ -209,7 +209,7 @@ static int is_tag_in_date_range(struct object *tag, struct rev_info *revs) { unsigned long size; enum object_type type; - char *buf, *line, *lineend; + char *buf = NULL, *line, *lineend; unsigned long date; int result = 1; @@ -221,17 +221,16 @@ static int is_tag_in_date_range(struct object *tag, struct rev_info *revs) goto out; line = memmem(buf, size, \ntagger , 8); if (!line++) - goto out_free; + goto out; lineend = memchr(line, '\n', buf + size - line); line = memchr(line, '', lineend ? lineend - line : buf + size - line); if (!line++) - goto out_free; + goto out; date = strtoul(line, NULL, 10); result = (revs-max_age == -1 || revs-max_age date) (revs-min_age == -1 || revs-min_age date); -out_free: - free(buf); out: + free(buf); return result; } -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/16] write_sha1_file: freshen existing objects
On Fri, Oct 03, 2014 at 02:29:58PM -0700, Junio C Hamano wrote: We can solve this by freshening objects that we avoid writing by updating their mtime. The algorithm for doing so is essentially the same as that of has_sha1_file. Therefore we provide a new (static) interface check_and_freshen, which finds and optionally freshens the object. It's trivial to implement freshening and simple checking by tweaking a single parameter. An old referent by a recent unreachable may be in pack. Is it expected that the same pack will have many similar old objects (in other words, is it worth trying to optimize check-and-freshen by bypassing access() and utime(), perhaps by keeping a freshened in this process already flag in struct packed_git)? Thanks for reminding me. I considered something like that early on and then completely forgot to revisit it. I do not have numbers either way on whether it is an optimization worth doing. On the one hand, it is very easy to do. On the other, it probably does not make a big difference; we are literally skipping the write of an entire object, and have just run a complete sha1 over the contents. A single utime() call probably is not a big deal. Could check-and-freshen-nonlocal() ever be called with freshen set to true? Should it be? In other words, should we be mucking with objects in other people's repositories with utime()? Yes, it can, and I think the answer to should is yes for safety, though I agree it feels a little hacky. I did explicitly write it so that we fail-safe when freshening doesn't work. That is, if we try to freshen an object that is in an alternate and we cannot (e.g., because we don't have write access), we'll fallback to writing out a new loose object locally. That's very much the safest thing to do, but obviously it performs less well. Again, this is the code path where we _would have_ written out the object anyway, so it might not be that bad. But I don't know to what degree the current code relies on that optimization for reasonable performance. E.g., if you clone from a read-only alternate and then try to `git write-tree` immediately on the index, will we literally make a full copy of each tree object? Hmm, that should be easy to test... $ su - nobody $ git clone -s ~peff/compile/linux /tmp/foo $ cd /tmp/foo $ git count-objects 0 objects, 0 kilobytes $ git write-tree $ git count-objects 0 objects, 0 kilobytes So far so good. Let's blow away the cache-tree to make sure... $ rm .git/index $ git read-tree HEAD $ git write-tree $ git count-objects 0 objects, 0 kilobytes So that's promising. But it's far from a proof that there isn't some other code path that will be negatively impacted. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/16] prune: keep objects reachable from recent objects
On Fri, Oct 03, 2014 at 02:47:57PM -0700, Junio C Hamano wrote: Sorry but this part is beyond a simple panda brain. I probably didn't explain it very well. I found it rather confusing to reason about. Let's see if we can go through it together. I can understand this If we have an object, even if it is unreachable, we should have its referent. as a description of the desired behaviour. If we have a tree that is unreachable, we must make sure that we do not discard blobs that are reachable from that tree, or we would end up corrupting our repository if we ever allow that tree to become reachable from our refs later. Yes, exactly. - if we are creating new objects, then we cannot create the parent object without having the child We cannot create the parent (e.g. tree) without having the child (e.g. blob that is referred to by the tree we are creating). So this bullet point is repeating the same thing? Sort of. The first statement if we have an object, we should have its referent is a desired state. The bullet points are trying to reason about the _changes_ in state, and argue that there are no state changes that take us from a valid state to an invalid one. There are two actions that affect the state of the object database: adding objects and removing objects. We cannot go from a valid state to an invalid state by adding objects, because we cannot create the parent without having the child. That is already the case before this patch (though as noted, you can cheat if you know the sha1 from another repository and bypass git's safety checks, but I do not think that is a case worth worrying about). - and if we are pruning objects, will not prune the child if we are keeping the parent We will not prune blob that are reachable from a tree that we are not yet ready to prune. So this again is repeating the same thing? This is the other action. When removing objects, we will keep the child along with the parent, and therefore we cannot move to an invalid state. That's the part that this patch does. With this patch applied, the system will not prune unreachable old objects that are reachable from a recent object (the recent object itself may or may not be reachable but that does not make any difference). And that is sufficient to ensure the integrity of the repository even if you allow new objects to be created reusing any of these unreachable objects that are left behind by prune, because the reachability check done during prune (with this patch applied) makes sure any object left in the repository can safely be used as a starting point of connectivity traversal. Yes, exactly. Ok, I think I got it now, but then do we still need to utime(2) the loose object files for unreachable objects that are referenced by a recent object (which is done in a later patch), or is that purely an optimization for the next round of gc where you would have more recent objects (i.e. you do not have to traverse to find out an old one is reachable from a new one, as there will be fewer old ones)? No, we don't need to utime() the loose objects. As long as there is a recently-written object that refers to them, they are considered worth keeping. The later patch that calls utime() on objects is really about defeating the write_sha1_file optimization. That is, you might _think_ you have written a recent object that refers to other objects, but the sha1-file code silently turned it into a noop. Does that make more sense? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/16] t5304: use helper to report failure of test foo = bar
On Fri, Oct 03, 2014 at 03:17:18PM -0700, Junio C Hamano wrote: That's a bit verbose. We could hide it behind something like test_eq, too, but it introduces several extra new processes. What do you mean by extra new processes? Whether open coded in a verbose way, or wrapped inside a helper, e.g. test_eql () { echo $1 expect shift $@ actual test_cmp expect actual } ... test_eql whatever do_something the number of processes would be the same, no? Or do you mean test_cmp is an extra process compared with test_eq whatever $(do_something) Sorry, yeah, I meant new processes versus test $foo = $bar. Hopefully, do_something does something more than what takes test_cmp to run, so I wouldn't be worried too much about it. Yeah, I may just be overly paranoid here. If we are not worried about a few extra processes, then the test_eql you showed above may be preferable, because its output is uniform with other test_cmp tests (although maybe it also introduces problems, because it does not handle stray whitespace in the same way, and it puts extra files in the working tree). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/16] prune: factor out loose-object directory traversal
On Fri, Oct 03, 2014 at 03:19:29PM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: Prune has to walk $GIT_DIR/objects/?? in order to find the set of loose objects to prune. Other parts of the code (e.g., count-objects) want to do the same. Let's factor it out into a reusable for_each-style function. Doesn't fsck also look at these as well? I recall that we also have a sort by inum trick employed there. Would it also be applicable to these two callers? I didn't think to check fsck. It does have a similar function and could be converted. Fsck does store the inode and later visits the files in inode order. I don't know if it matters as much here, because we are not opening most of the files, just stat()ing them. I'd also be kind of surprised if this optimization really matters in practice these days. The inode-sorting comes from 7e8c174 (fsck-cache: sort entries by inode number, 2005-05-02). That predates packfiles by several months, I believe, and you are not likely to have a very large number of loose objects anymore (and if you do, you will get _way_ better performance simply by packing). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/16] prune: keep objects reachable from recent objects
On Fri, Oct 03, 2014 at 02:47:57PM -0700, Junio C Hamano wrote: With this patch applied, the system will not prune unreachable old objects that are reachable from a recent object (the recent object itself may or may not be reachable but that does not make any difference). And that is sufficient to ensure the integrity of the repository even if you allow new objects to be created reusing any of these unreachable objects that are left behind by prune, because the reachability check done during prune (with this patch applied) makes sure any object left in the repository can safely be used as a starting point of connectivity traversal. Your use of safely in the last sentence here made me think. In a repository that has had this patch from the beginning, it should be safe to traverse the unreachable but unpruned objects, because the property of the repository we are trying to guarantee means that we will have all of the referents. But in a repository created by current git (or one where you have been fooling around with hash-object), we might not. Our secondary traversal for recent objects may fail because we are already missing some of the referents, and prepare_revision_walk (or traverse_commit_list) will barf. That's a bad thing. I think we need to tell that second traversal that it's OK to skip missing objects that are reachable from recent objects. It's not ideal, but there's nothing better we can do. Eventually those objects will fall off the back of the expiration window, but we should not be holding up prunes (which, after all, are the thing that expires them :) ) in the meantime. I think it would be enough to just set revs-ignore_missing_links for the secondary traversal. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/16] foreach_alt_odb: propagate return value from callback
On Sat, Oct 04, 2014 at 12:55:13AM +0200, René Scharfe wrote: -void foreach_alt_odb(alt_odb_fn fn, void *cb) +int foreach_alt_odb(alt_odb_fn fn, void *cb) { struct alternate_object_database *ent; +int r = 0; prepare_alt_odb(); -for (ent = alt_odb_list; ent; ent = ent-next) -if (fn(ent, cb)) -return; +for (ent = alt_odb_list; ent; ent = ent-next) { +int r = fn(ent, cb); +if (r) +break; +} +return r; This will always return zero. You probably shadowed r unintentionally inside the loop, right? Eek, yes. Thanks for catching. I'll fix it in the re-roll. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/16] prune: keep objects reachable from recent objects
Jeff King p...@peff.net writes: On Fri, Oct 03, 2014 at 02:47:57PM -0700, Junio C Hamano wrote: With this patch applied, the system will not prune unreachable old objects that are reachable from a recent object (the recent object itself may or may not be reachable but that does not make any difference). And that is sufficient to ensure the integrity of the repository even if you allow new objects to be created reusing any of these unreachable objects that are left behind by prune, because the reachability check done during prune (with this patch applied) makes sure any object left in the repository can safely be used as a starting point of connectivity traversal. Your use of safely in the last sentence here made me think. In a repository that has had this patch from the beginning, it should be safe to traverse the unreachable but unpruned objects, because the property of the repository we are trying to guarantee means that we will have all of the referents. ... Another case that may be of interest is to start a dumb HTTP commit walker to fetch new history and kill it before it manages to complete the graph and update the refs. It has the same property as running hash-objects to create an unconnected cruft object after you got your repository into a safe state, but it would be a lot less easier to blame the user for the resulting breakage. Perhaps the dumb commit walkers outlived their usefulness and we should start talking about their deprecation and eventual removal at a later version of Git? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html