Re: [PATCH/RFC 5/5] add tests for checking the behaviour of unset.variable

2014-10-03 Thread Matthieu Moy
Junio C Hamano gits...@pobox.com writes:

 Tanay Abhra tanay...@gmail.com writes:

 I can think of two solutions, one leave it as it is and advertise it to be
 explicitly typed in the config files at the appropriate position or to change
 the behavior of unset.variable to unset all matching variables in that file,
 before and after. We could also change git config --add to append at the end
 of the file regardless the variable exists or not. Which course of action
 do you think would be best?

 Off the top of my head, from an end-user's point of view, something
 like this would give a behaviour that is at least understandable:

  (1) forbid git config command line from touching unset.var, as
  there is no way for a user to control where a new unset.var
  goes.  And

Well, the normal use-case for unset.variable is to put it in a local
config file, to unset a variable set in another, lower-priority file.

This common use-case works with the command-line git config, and it
would be a pity to forbid the common use-case because of a particular,
unusual case.

  (2) When adding or appending section.var (it may also apply to
  removing one--you need to think about it deeper), ignore
  everything that comes before the last appearance of unset.var
  that unsets the section.var variable.

That would probably be the best option from a user's point of view, but
I'd say the implementation complexity is not worth the trouble.

 Alternatively, if the syntax to unset a section.var were not

   [unset]
   variable = section.var

 but rather

   [section]
   ! variable

 or soemthing, then the current find the section and append at the
 end code may work as-is.

But that would break backward compatibility rather badly: old git's
would stop working completely in repositories using this syntax.

Well, perhaps we can also consider that this is acceptable: just don't
use the feature for a few years if you care about backward
compatibility.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TOY PATCH]: rebase: Add --show-files option

2014-10-03 Thread Chris Packham
Hi,

On Fri, Oct 3, 2014 at 5:42 PM, Nazri Ramliy ayieh...@gmail.com wrote:
 Hi,

 When working on a new feature branch that touches a lot of files I
 tend to make commits that affect only single files, and for very small
 changes. Since at this stage I'm experimentating a lot - trying out
 ideas, etc. - the commits tend to grow a lot (could be 50-70
 individual commits, each modifying one or two files), and I don't
 think much about the commit message beside making a one-liner that
 explains only the gist.

 Most of the times I include the filename in the commit message to help
 me identify which commits should be squashed together later.

 Only when the feature seems to be functional that I git rebase the
 commits in order to shape the history into its final, proper form.

 When rebasing these upwards of 40+ commits, it is helpful if the
 rebase instruction sheet shows me the actual files that the commits
 affect so I made this patch (sorry I couldn't attach it inline since
 gmail eats all the tabs) that adds the --show-files option to
 git-rebase to achieve something to this effect:

 pick 996fa59 Remove autoconf submodule
  # :100644 100644 cfc8a25... 28ddb02... M   .gitmodules
  # :16 00 0263a9f... 000... D   autoconf
 ... more pick lines
 pick 4c5070f Remove automake submodule
  # :100644 100644 28ddb02... f907328... M   .gitmodules
  # :16 00 9042530... 000... D   automake

 Having the list of files shown below each commit, indented to reduce
 cluttering the pick instruction, really does help in deciding the
 reorder and squash candidates.

Sounds neat. I do similar things and do sometimes lose track of which
files are being touched by multiple fix compile error commits. I
haven't actually looked at your patch because gmail can't display it
in-line but it's a feature I'd use.


 The files list came from this:

   git show --raw $sha1|awk '/^:/ {print  '${comment_char}'\t$0}'

 Thoughts?

 nazri
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Compile fails on OSX 10.9 with FSF GCC in common crypto

2014-10-03 Thread Volker Braun
* Works with Apple clang
* Works with FSF GCC and NO_APPLE_COMMON_CRYPTO=yes.
* Fails with FSF GCC, see below
* FWIW, git-1.8.4.4 can be compiled with FSF GCC

$ gcc --version
gcc (GCC) 4.7.3
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ./configure 
configure: Setting lib to 'lib' (the default)
configure: Will try -pthread then -lpthread to enable POSIX Threads.
configure: CHECKS for site configuration
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... cpp
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for size_t... yes
checking for working alloca.h... yes
checking for alloca... yes
configure: CHECKS for programs
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to accept ISO C89... (cached) none needed
checking for inline... inline
checking if linker supports -R... no
checking if linker supports -Wl,-rpath,... yes
checking for gtar... no
checking for tar... tar
checking for gnudiff... no
checking for gdiff... no
checking for diff... diff
checking for asciidoc... no
configure: CHECKS for libraries
checking for SHA1_Init in -lcrypto... yes
checking for curl_global_init in -lcurl... yes
checking for XML_ParserCreate in -lexpat... yes
checking for iconv in -lc... no
checking for iconv in -liconv... yes
checking for deflateBound in -lz... yes
checking for socket in -lc... yes
checking for inet_ntop... yes
checking for inet_pton... yes
checking for hstrerror... yes
checking for basename in -lc... yes
checking for gettext in -lc... no
checking libintl.h usability... no
checking libintl.h presence... no
checking for libintl.h... no
configure: CHECKS for header files
checking sys/select.h usability... yes
checking sys/select.h presence... yes
checking for sys/select.h... yes
checking sys/poll.h usability... yes
checking sys/poll.h presence... yes
checking for sys/poll.h... yes
checking for inttypes.h... (cached) yes
checking for old iconv()... no
configure: CHECKS for typedefs, structures, and compiler characteristics
checking for socklen_t... yes
checking for struct dirent.d_ino... yes
checking for struct dirent.d_type... yes
checking for struct passwd.pw_gecos... yes
checking for struct sockaddr_storage... yes
checking for struct addrinfo... yes
checking for getaddrinfo... yes
checking for library containing getaddrinfo... none required
checking whether the platform regex can handle null bytes... yes
checking whether system succeeds to read fopen'ed directory... no
checking whether snprintf() and/or vsnprintf() return bogus value... no
configure: CHECKS for library functions
checking libgen.h usability... yes
checking libgen.h presence... yes
checking for libgen.h... yes
checking paths.h usability... yes
checking paths.h presence... yes
checking for paths.h... yes
checking libcharset.h usability... yes
checking libcharset.h presence... yes
checking for libcharset.h... yes
checking for strings.h... (cached) yes
checking for locale_charset in -liconv... yes
checking for strcasestr... yes
checking for library containing strcasestr... none required
checking for memmem... yes
checking for library containing memmem... none required
checking for strlcpy... yes
checking for library containing strlcpy... none required
checking for uintmax_t... yes
checking for strtoumax... yes
checking for library containing strtoumax... none required
checking for setenv... yes
checking for library containing setenv... none required
checking for unsetenv... yes
checking for library containing unsetenv... none required
checking for mkdtemp... yes
checking for library containing mkdtemp... none required
checking for mkstemps... yes
checking for library containing mkstemps... none required
checking for initgroups... yes
checking for library containing initgroups... none required
checking for POSIX Threads with ''... yes
configure: creating ./config.status
config.status: creating config.mak.autogen
config.status: executing config.mak.autogen commands

$ make
GIT_VERSION = 2.1.2
* new build flags
CC credential-store.o
In file included 

Re: Submodules and GIT_ALTERNATE_OBJECT_DIRECTORIES

2014-10-03 Thread Jens Lehmann

Am 30.09.2014 um 15:25 schrieb Michal Sojka:

I'd like to shorten the time needed by our continuous integration (CI)
tool to clone the source repositories. Currently the full clone takes
about 10 minutes (even from local server). Our main repository has
several submodules so the CI tool runs git submodule update --init. My
idea was to use GIT_ALTERNATE_OBJECT_DIRECTORIES to cache objects from
several submodule repositories locally. However, this does not work
because GIT_ALTERNATE_OBJECT_DIRECTORIES is considered local to the
super-project and is not propagated to the git clone for submodules
(git-submodule.sh calls clear_local_git_env).



My question is why is GIT_ALTERNATE_OBJECT_DIRECTORIES considered local
to the repository? If I could modify the command-line I would use the
git submodule update with the --reference option, which is propagated
to clones of all submodules. Letting GIT_ALTERNATE_OBJECT_DIRECTORIES
propagate to the submodules should have the same effect as --reference
option. So why it is not propagated?


Because then it would /always/ propagate? So while that would have the
same effect as using the --reference option, not using the --reference
option would behave differently, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Submodules and GIT_ALTERNATE_OBJECT_DIRECTORIES

2014-10-03 Thread Michal Sojka
On Fri, Oct 03 2014, Jens Lehmann wrote:
 Am 30.09.2014 um 15:25 schrieb Michal Sojka:
 I'd like to shorten the time needed by our continuous integration (CI)
 tool to clone the source repositories. Currently the full clone takes
 about 10 minutes (even from local server). Our main repository has
 several submodules so the CI tool runs git submodule update --init. My
 idea was to use GIT_ALTERNATE_OBJECT_DIRECTORIES to cache objects from
 several submodule repositories locally. However, this does not work
 because GIT_ALTERNATE_OBJECT_DIRECTORIES is considered local to the
 super-project and is not propagated to the git clone for submodules
 (git-submodule.sh calls clear_local_git_env).
  
 My question is why is GIT_ALTERNATE_OBJECT_DIRECTORIES considered local
 to the repository? If I could modify the command-line I would use the
 git submodule update with the --reference option, which is propagated
 to clones of all submodules. Letting GIT_ALTERNATE_OBJECT_DIRECTORIES
 propagate to the submodules should have the same effect as --reference
 option. So why it is not propagated?

 Because then it would /always/ propagate? So while that would have the
 same effect as using the --reference option, not using the --reference
 option would behave differently, no?

That's a good reason, thanks. Fortunately, I found a way how to add
--reference to the submodule update command issued by the CI tool
(Bamboo). Instead of calling git directly, the CI tool calls my git
script, which modifies the command line in case of submodule update
and then calls the real git.

Best regards,
-Michal
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Feature request: Tracking info for remote branches

2014-10-03 Thread Stefan Monnier
I really like the way Git officializes the relation between branches via
the notion of tracking.  I can see which local branch tracks which
remote branch easily, and that's very helpful.

But when I find a Git repository on the Web, I often have no idea about
the relationship between its branches, all I have instead is the
branches's names, which is often not sufficient.

Let's take for example https://bitbucket.org/emiliolopez/linux.git.
Among its branches I see sunxi-codec and sunxi-codec-v0.

The repository name gives me a hint that these are really
branches of the Linux kernel (rather than, say, branches of Emacs).
And the branches's names gice me a hint that these are related to
support for sunxi (aka Allwinner SoCs) and more specifically support
for codecs.  I know from out-of-band info that these are for audio
codecs.

But I don't know if these branches track linux-next, sunxi-devel, or
some other branch.  I can manually try to find out, by comparing the
distance to each one of those branches.  But:
- it's costly.
- it can only be done manually, because a script wouldn't know that
  linux-next and sunxi-devel are better candidate branches than, say,
  emacs-24.
- as a consequence, front ends (bitbucket, cgit, gitweb, younameit)
  can't show this information.

So, I'd like to suggest that Git be changed so that the branch tracking
information is also maintained and made available for remote branches.
E.g. when you push a branch, the info about which is the corresponding
target branch (if any) be also pushed to the remote repository along
with the actual new commits.  Similarly, when cloning a remote
repository, that information should be copied locally so you can ask Git
about the relationship between those remote branches you just downloaded.


Stefan

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] init - Honour the global core.filemode setting

2014-10-03 Thread Torsten Bögershausen
On 2014-10-02 19.02, Junio C Hamano wrote:
 Torsten Bögershausen tbo...@web.de writes:
 
 On 2014-10-01 19.10, Junio C Hamano wrote:
 Hilco Wijbenga hilco.wijbe...@gmail.com writes:

 Perhaps I completely misunderstand the meaning of core.filemode but I
 thought it determined whether Git cared about changes in file
 properties?

 By setting it to false, you tell Git that the filesystem you
 placed the repository does not correctly represent the filemode
 (especially the executable bit).

 core.fileMode in git config --help reads:

core.fileMode
If false, the executable bit differences between the
index and the working tree are ignored; useful on broken
filesystems like FAT. See git-update- index(1).

 Out of my head: Could the following be a starting point:

 core.fileMode
 If false, the executable bit differences between the
 index and the working tree are ignored.
 This may be usefull when visiting a cygwin repo with a non-cygwin
 Git client. (should we mention msysgit ? should we mention 
 JGit/EGit ?)
 
 Between these two sentences, there may still be the same cognitive
 gap that may have lead to the original confusion.
 
 The first sentence says what happens, as it should.
 
 But it is not directly clear what makes the executable bit differ
 and when it is a useful thing to ignore the differences, so the
 second sentence that says This may be useful does not give the
 reader very much.
 
Clearly a major improvement.

Does this (still) include the original line
See linkgit:git-update-index[1]

which helps the user to add *.sh files executable to the index, even if
core.filemode is false ?
One minor improvement below.

 Here is my attempt.
 
   Tells Git if the executable bit of files in the working tree
   is to be honored.
 
   Some filesystems lose the executable bit when a file that is
   marked as executable is checked out, or checks out an
   non-executable file with executable bit on.  git init and
   git clone probe the filesystem to see if it records
   executable bit correctly when they create a new repository
   and this variable is automatically set as necessary.
 
 A repository, however, may be on a filesystem that records
 the filemode correctly, and this variable is set to 'true'
 when created, but later may be made accessible from another
 environment that loses the filemode (e.g. exporting ext4 via
 CIFS mount, visiting a Cygwin managed repository with
 MsysGit).  In such a case, it may be necessary to set this
 variable to 'false'.
   

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] init - Honour the global core.filemode setting

2014-10-03 Thread Junio C Hamano
Torsten Bögershausen tbo...@web.de writes:

 The first sentence says what happens, as it should.
 
 But it is not directly clear what makes the executable bit differ
 and when it is a useful thing to ignore the differences, so the
 second sentence that says This may be useful does not give the
 reader very much.
 
 Clearly a major improvement.

 Does this (still) include the original line
 See linkgit:git-update-index[1]

 which helps the user to add *.sh files executable to the index, even if
 core.filemode is false ?

Thanks for catching; I omitted the reference by accident.

Perhaps end that sentence like this instead?

... may be necessary to set this variable to `false`, and
maintain the executable bit with `git update-index --chmod`
(see linkgit:git-update-index[1]).

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 5/5] add tests for checking the behaviour of unset.variable

2014-10-03 Thread Junio C Hamano
Matthieu Moy matthieu@grenoble-inp.fr writes:

 Junio C Hamano gits...@pobox.com writes:
 ...
 Off the top of my head, from an end-user's point of view, something
 like this would give a behaviour that is at least understandable:

Let's make sure we have the same starting point (i.e. understanding
of the limitation of the current code) in mind.

The git config [--add] section.var value UI, which does not give
the user any control over where the new definition goes in the
resulting file, and the current implementation, which finds the last
existing section and adds the var = value definition at the end
(or adds a section at the end and then adds the definition there),
are adequate for ordinary variables.

It is fine for single-valued ones that follow the last one wins
semantics; git config would add the new definition at the end and
that definition will win.

It is manageable for multi-valued variables, too.  For uses of a
multi-valued variable to track unordered set of values, by
definition the order does not matter.

Side note.  Otherwise, the scripter needs to read the
existing ones, --unset-all them, and then add the elements
of the final list in desired order.  It is cumbersome, but
for a single multi-valued variable it is manageable.

But the unset.variable by nature wants a lot finer-grained control
over the order in which other ordinary variables are defined and it
itself is placed.  No matter what improvements you attempt to make
to the implementation, because the UI is not getting enough
information from the user to learn where exactly the user wants to
add a new variable or unset.variable, it would not give you enough
flexibility to do the job.

  (1) forbid git config command line from touching unset.var, as
  there is no way for a user to control where a new unset.var
  goes.  And

 Well, the normal use-case for unset.variable is to put it in a local
 config file, to unset a variable set in another, lower-priority file.

I agree that is one major use case.

 This common use-case works with the command-line git config, and it
 would be a pity to forbid the common use-case because of a particular,
 unusual case.

Either you are being incoherent or I am not reading you right.  If
you said If this common use-case worked with the command-line 'git
config', it would be nice, but it would be a pity because it does
not, I would understand.

If you use the command line 'git config', i.e.

git config unset.variable xyzzy.frotz

in a repository whose .git/config does not have any unset.variable,
you will add that _at the end_, which would undo what you did in
your configuration file, not just what came before yours.  Even if
you ignore more exotic cases, the command line is *not* working.

That is why I said unset.variable is unworkable with existing git
config command line.  Always appending at the end is usable for
ordinary variables, but for unset.variable, it is most likely the
least useful thing to do.  You can explain among 47 different
things it could do, we chose to do the most useless thing, because
that is _consistent_ with how the ordinary variables are placed in
the cofiguration file in the documentation but it forgets to
question if unset.variable should be treated the same way as
ordinary variables in the first place.

Another use case would be to override what includes gave us.  I.e.

[unset]
variable = ... nullify some /etc/gitconfig values ...
[include]
path = ~/.gitcommon
[unset]
variable = ... nullify some ~/.gitcommon values ...
[xyzzy]
frotz = nitfol

Special-casing unset.variable and always adding them at the
beginning of the output *might* turn it from totally useless to
slightly usable.  At least it supports the nullify previous ones,
even though it does not help nullify after include.

I doubt if such a change to add unset.variable always at the top is
worth it, though.

  (2) When adding or appending section.var (it may also apply to
  removing one--you need to think about it deeper), ignore
  everything that comes before the last appearance of unset.var
  that unsets the section.var variable.

 That would probably be the best option from a user's point of view, but
 I'd say the implementation complexity is not worth the trouble.

test_expect_success declares a particular behaviour as the right
thing to happen in order to protect that right behaviour from
future changes.  It is OK for you to illustrate what illogical thing
the implementation does as a caveat, and admit that the behaviour
comes because we consider the change is not worth the trouble and we
punted.  But pretending as if that is the right behaviour and the
system has to promise that the broken behaviour will be kept forever
by casting it in stone is the worst thing you can do, I am afraid.

Unfortunately, as I already said, there is no sensible behaviour to

Re: [PATCH/RFC 5/5] add tests for checking the behaviour of unset.variable

2014-10-03 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 That is why I said unset.variable is unworkable with existing git
 config command line.  Always appending at the end is usable for
 ordinary variables, but for unset.variable, it is most likely the
 least useful thing to do.  You can explain among 47 different
 things it could do, we chose to do the most useless thing, because
 that is _consistent_ with how the ordinary variables are placed in
 the cofiguration file in the documentation but it forgets to
 question if unset.variable should be treated the same way as
 ordinary variables in the first place.

This is a tangent, but the above also applies to the include.path.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/24] refs.c: pass a list of names to skip to is_refname_available

2014-10-03 Thread Jonathan Nieder
Junio C Hamano wrote:
 Jonathan Nieder jrnie...@gmail.com writes:

 diff --git a/refs.c b/refs.c
 index f124c2b..6820c93 100644
 --- a/refs.c
 +++ b/refs.c
 @@ -801,14 +801,16 @@ static int names_conflict(const char *refname1, const 
 char *refname2)
 
  struct name_conflict_cb {
  const char *refname;
 -const char *oldrefname;
  const char *conflicting_refname;
 +struct string_list *skiplist;
  };

 As cbe73331 (refs: speed up is_refname_available, 2014-09-10)
 touches the same area and is now in 'master', the logic around here
 in this series needs to be reworked.

Thanks for the heads up.  (I hadn't realized the ref-transaction-1 was
part of 'master' --- yay!)  Rebased and reworked:
https://code-review.googlesource.com/1027

I'll give a week or so for review comments on this version of the
series and then post a hopefully final version.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 1/2] config: Add safe-include directive

2014-10-03 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Even though I did allude to ../project.gitconfig in the original message, I
 think there should probably be an explicit syntax to name a path that is
 relative to the root level of the working tree. People do funky things using
 $GIT_DIR and $GIT_WORK_TREE to break the .. relative to the config
 file is the root level of the working tree assumption, and also a repository
 can have a regular file .git that points at the real location of the 
 directory
 that has config in it, in which case its parent directory is very unlikely 
 to
 be the root level of the working tree.

There is another reason why I suspect that it may make the resulting
system more useful if we had a way to explicitly mark the path you
used to safeInclude (by the way, we do not do dashes in names for
configuration by convention) as referring to something inside the
project's working tree.  In a bare repository, we might want to grab
the blob at that path in HEAD (i.e. the project's primary branch)
and include its contents.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/24] refs.c: pass a list of names to skip to is_refname_available

2014-10-03 Thread Junio C Hamano
Jonathan Nieder jrnie...@gmail.com writes:

 Junio C Hamano wrote:
 ...
 As cbe73331 (refs: speed up is_refname_available, 2014-09-10)
 touches the same area and is now in 'master', the logic around here
 in this series needs to be reworked.

 Thanks for the heads up.  (I hadn't realized the ref-transaction-1 was
 part of 'master' --- yay!)  Rebased and reworked:
 https://code-review.googlesource.com/1027

Heh, I was sort of expecting that heavy-weight contributors that
throw many and/or large patch series would at least be paying
attention to What's cooking reports.

Will take a look over there (I really hate having to context switch
only to review this series, though).

 I'll give a week or so for review comments on this version of the
 series and then post a hopefully final version.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TOY PATCH]: rebase: Add --show-files option

2014-10-03 Thread Junio C Hamano
Nazri Ramliy ayieh...@gmail.com writes:

 When rebasing these upwards of 40+ commits, it is helpful if the
 rebase instruction sheet shows me the actual files that the commits
 affect so I made this patch (sorry I couldn't attach it inline since
 gmail eats all the tabs) that adds the --show-files option to
 git-rebase to achieve something to this effect:

 pick 996fa59 Remove autoconf submodule
  # :100644 100644 cfc8a25... 28ddb02... M   .gitmodules
  # :16 00 0263a9f... 000... D   autoconf
 ... more pick lines
 pick 4c5070f Remove automake submodule
  # :100644 100644 28ddb02... f907328... M   .gitmodules
  # :16 00 9042530... 000... D   automake

 Having the list of files shown below each commit, indented to reduce
 cluttering the pick instruction, really does help in deciding the
 reorder and squash candidates.

Sounds like a good idea to give helpful information in a comment
form to the insn sheet.

Other than two minor points:

 - If I were doing this, I would have used diff-tree --stat
   --summary instead of show --raw.  You can tell
   deletion/addition by paying attention to 0's and also mode
   changes, but the information density of --raw for human
   consumption is rather low.

 - Regardless of the above, I am not sure if dumping listing of 100+
   paths modified would really help, and it might make sense to cap
   the number of paths displayed for each change.

I didn't look at your implementation at all, though.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 5/5] add tests for checking the behaviour of unset.variable

2014-10-03 Thread Matthieu Moy
Junio C Hamano gits...@pobox.com writes:

 The git config [--add] section.var value UI, [...] finds the var = value
 definition at the end (or adds a section at the end and then adds
 [...]

 It is fine for single-valued ones that follow the last one wins
 semantics; git config would add the new definition at the end and
 that definition will win.

Not always.

git config foo.bar old-value
git config unset.variable foo.bar
git config foo.bar new-value

One could expect the new value to be taken into account, but it is not.

 Well, the normal use-case for unset.variable is to put it in a local
 config file, to unset a variable set in another, lower-priority file.

 I agree that is one major use case.

 This common use-case works with the command-line git config, and it
 would be a pity to forbid the common use-case because of a particular,
 unusual case.

 Either you are being incoherent or I am not reading you right.  If
 you said If this common use-case worked with the command-line 'git
 config', it would be nice, but it would be a pity because it does
 not, I would understand.

I think you missed the another in my sentence above. The normal
use-case is to have foo.bar and unset.variable=foo.bar in different
files. In this case, you do not care about the position in file.

 in a repository whose .git/config does not have any unset.variable,
 you will add that _at the end_, which would undo what you did in
 your configuration file, not just what came before yours.  Even if
 you ignore more exotic cases, the command line is *not* working.

If my sysadmin has set foo.bar=boz in /etc/gitconfig, I can use

  git config [--global] unset.variable foo.bar

and it does work. Always.

Playing with the order of variables in-file is essentially useless OTOH
except for the include case you mentionned (if I want to unset a
variable in a file, I'll just delete or comment out the variable and I
don't need unset.variable).

Really, I don't see the point in making any complex plans to support the
useless part of the unset.variable feature. The reason it was designed
for already works, and $EDITOR does the job for other cases.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 5/5] add tests for checking the behaviour of unset.variable

2014-10-03 Thread Junio C Hamano
Matthieu Moy matthieu@grenoble-inp.fr writes:

 Junio C Hamano gits...@pobox.com writes:

 The git config [--add] section.var value UI, [...] finds the var = value
 definition at the end (or adds a section at the end and then adds
 [...]

 It is fine for single-valued ones that follow the last one wins
 semantics; git config would add the new definition at the end and
 that definition will win.

 Not always.

 git config foo.bar old-value
 git config unset.variable foo.bar
 git config foo.bar new-value

 One could expect the new value to be taken into account, but it is not.

I think you misunderstood what I said.  With ordinary variables,
everything works fine, that is, without unset.variable, which this
series is trying to pretend as if it were just another ordinary
variable, but it is not.  You are just showing how broken it is to
treat unset.variable as just another ordinary variable, which is
what I've been telling you.

So we are in agreement.

 Well, the normal use-case for unset.variable is to put it in a local
 config file, to unset a variable set in another, lower-priority file.

 I agree that is one major use case.

 This common use-case works with the command-line git config, and it
 would be a pity to forbid the common use-case because of a particular,
 unusual case.

 Either you are being incoherent or I am not reading you right.  If
 you said If this common use-case worked with the command-line 'git
 config', it would be nice, but it would be a pity because it does
 not, I would understand.

 I think you missed the another in my sentence above. The normal
 use-case is to have foo.bar and unset.variable=foo.bar in different
 files. In this case, you do not care about the position in file.

I didn't miss anything.  The reason you want to have unset foo.bar
in your .git/config is to override a foo.bar = z you have in
another place, e.g. ~/.gitconfig; which would let you sanely do
another foo.bar = x in .git/config for multi-value foo.bar, *if*
the unset comes before foo.bar.

But what if you have this in your .git/config file

[core]
bare = false
... standard stuff left by git init ...
[foo]
bar = x

before you add unset foo.bar there?

And that is not a contribed example.

The likely reason why you want to resort to unset foo.bar is that
you found that you get an unwanted foo.bar=z in addition to desired
foo.bar=z in a repository that has the above in .git/config, and at
that point you would want to say I want to unset the outside
influence.

And there is no [unset] variable = foo.bar in there yet; without
some special casing, if you treated unset.variable as if it were
just another ordinary variable, it would go after you define your
own foo.bar=x, which is not what you want.



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/16] foreach_alt_odb: propagate return value from callback

2014-10-03 Thread Jeff King
We check the return value of the callback and stop iterating
if it is non-zero. However, we do not make the non-zero
return value available to the caller, so they have no way of
knowing whether the operation succeeded or not (technically
they can keep their own error flag in the callback data, but
that is unlike our other for_each functions).

Signed-off-by: Jeff King p...@peff.net
---
 cache.h |  2 +-
 sha1_file.c | 12 
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 8206039..cd16e25 100644
--- a/cache.h
+++ b/cache.h
@@ -1143,7 +1143,7 @@ extern void prepare_alt_odb(void);
 extern void read_info_alternates(const char * relative_base, int depth);
 extern void add_to_alternates_file(const char *reference);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
-extern void foreach_alt_odb(alt_odb_fn, void*);
+extern int foreach_alt_odb(alt_odb_fn, void*);
 
 struct pack_window {
struct pack_window *next;
diff --git a/sha1_file.c b/sha1_file.c
index c08c0cb..bae1c15 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -412,14 +412,18 @@ void add_to_alternates_file(const char *reference)
link_alt_odb_entries(alt, strlen(alt), '\n', NULL, 0);
 }
 
-void foreach_alt_odb(alt_odb_fn fn, void *cb)
+int foreach_alt_odb(alt_odb_fn fn, void *cb)
 {
struct alternate_object_database *ent;
+   int r = 0;
 
prepare_alt_odb();
-   for (ent = alt_odb_list; ent; ent = ent-next)
-   if (fn(ent, cb))
-   return;
+   for (ent = alt_odb_list; ent; ent = ent-next) {
+   int r = fn(ent, cb);
+   if (r)
+   break;
+   }
+   return r;
 }
 
 void prepare_alt_odb(void)
-- 
2.1.1.566.gdb1f904

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/16] make prune mtime-checking more careful

2014-10-03 Thread Jeff King
At GitHub we've occasionally run across repos getting corrupted by trees
and blobs near the tip going missing. We do a lot of test merges
between branches and HEAD (this is what feeds the OK to merge button
on the web interface), and the objects are almost always related to
these merges. The objects are removed by prune, which doesn't realize
that they are part of an ongoing operation. Prune uses the filesystem
mtime to determine this, but we are not very thorough in making sure
that is kept up to date.

This series tries to fix that with two techniques:

  1. When we try to write an object to disk, we optimize out the write
 if we already have the object. Instead, we should still update the
 mtime of the object in this case.

  2. Treat objects reachable from recent objects as recent themselves.

 When we check that we have an object, we do not check whether we
 have all of the objects it can reach. If you have some new objects
 that refer to some old objects (e.g., you create and delete a tree
 on day 1, and then create a new tree referring to the blob on day
 2), then prune may delete the old object but not the new (in this
 case, we delete the blob but not the tree).

 Any subsequent use of the new object will check that we have it
 (e.g., commit-tree makes sure we have the tree we feed it), but not
 other objects it can reach. This can lead to referencing a
 half-formed part of the graph.

Note that this does not make prune race-free. For example, you could
check for and update the mtime of an object just as prune is deleting
it, and think that it is written when it is not. Fixing that would
require some atomic mechanism for prune to check the mtime and delete.
But I do think this series cuts us down to real race conditions, with
millisecond-ish timing. The problems we're fixing here are much worse
than that. The distance between an object being written and being
referred may operate on human timescales (e.g., writing a commit
message). Or the time distance between two objects that refer to each
other may be days or weeks; a prune where one falls in the recent
boundary and another does not can be disastrous.

There's quite a lot of patches here, but most of them are preparatory
cleanups. The meat is in patches 13, 15, and 16.

  [01/16]: foreach_alt_odb: propagate return value from callback
  [02/16]: isxdigit: cast input to unsigned char
  [03/16]: object_array: factor out slopbuf-freeing logic
  [04/16]: object_array: add a clear function
  [05/16]: clean up name allocation in prepare_revision_walk
  [06/16]: reachable: clear pending array after walking it
  [07/16]: t5304: use test_path_is_* instead of test -f
  [08/16]: t5304: use helper to report failure of test foo = bar
  [09/16]: prune: factor out loose-object directory traversal
  [10/16]: count-objects: do not use xsize_t when counting object size
  [11/16]: count-objects: use for_each_loose_file_in_objdir
  [12/16]: sha1_file: add for_each iterators for loose and packed objects
  [13/16]: prune: keep objects reachable from recent objects
  [14/16]: pack-objects: refactor unpack-unreachable expiration check
  [15/16]: pack-objects: match prune logic for discarding objects
  [16/16]: write_sha1_file: freshen existing objects

Note that these aren't yet running on GitHub servers. I know that they
fix real potential problems (see the new t6501 for examples), but I
don't know for sure if they will catch the problems we have seen. The
frequency of these issues is relatively rare, so even once deployed, we
won't know for sure until a few weeks or months have passed.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/16] object_array: factor out slopbuf-freeing logic

2014-10-03 Thread Jeff King
This is not a lot of code, but it's a logical construct that
should not need to be repeated (and we are about to add a
third repetition).

Signed-off-by: Jeff King p...@peff.net
---
 object.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/object.c b/object.c
index ca9d790..14238dc 100644
--- a/object.c
+++ b/object.c
@@ -355,6 +355,12 @@ void add_object_array_with_context(struct object *obj, 
const char *name, struct
add_object_array_with_mode_context(obj, name, array, 
S_IFINVALID, context);
 }
 
+static void object_array_release_entry(struct object_array_entry *ent)
+{
+   if (ent-name != object_array_slopbuf)
+   free(ent-name);
+}
+
 void object_array_filter(struct object_array *array,
 object_array_each_func_t want, void *cb_data)
 {
@@ -367,8 +373,7 @@ void object_array_filter(struct object_array *array,
objects[dst] = objects[src];
dst++;
} else {
-   if (objects[src].name != object_array_slopbuf)
-   free(objects[src].name);
+   object_array_release_entry(objects[src]);
}
}
array-nr = dst;
@@ -400,8 +405,7 @@ void object_array_remove_duplicates(struct object_array 
*array)
objects[array-nr] = objects[src];
array-nr++;
} else {
-   if (objects[src].name != object_array_slopbuf)
-   free(objects[src].name);
+   object_array_release_entry(objects[src]);
}
}
 }
-- 
2.1.1.566.gdb1f904

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/16] object_array: add a clear function

2014-10-03 Thread Jeff King
There's currently no easy way to free the memory associated
with an object_array (and in most cases, we simply leak the
memory in a rev_info's pending array). Let's provide a
helper to make this easier to handle.

We can make use of it in list-objects.c, which does the same
thing by hand (but fails to free the name field of each
entry, potentially leaking memory).

Signed-off-by: Jeff King p...@peff.net
---
 list-objects.c |  7 +--
 object.c   | 10 ++
 object.h   |  6 ++
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 3595ee7..fad6808 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -228,11 +228,6 @@ void traverse_commit_list(struct rev_info *revs,
die(unknown pending object %s (%s),
sha1_to_hex(obj-sha1), name);
}
-   if (revs-pending.nr) {
-   free(revs-pending.objects);
-   revs-pending.nr = 0;
-   revs-pending.alloc = 0;
-   revs-pending.objects = NULL;
-   }
+   object_array_clear(revs-pending);
strbuf_release(base);
 }
diff --git a/object.c b/object.c
index 14238dc..e36e33d 100644
--- a/object.c
+++ b/object.c
@@ -379,6 +379,16 @@ void object_array_filter(struct object_array *array,
array-nr = dst;
 }
 
+void object_array_clear(struct object_array *array)
+{
+   int i;
+   for (i = 0; i  array-nr; i++)
+   object_array_release_entry(array-objects[i]);
+   free(array-objects);
+   array-objects = NULL;
+   array-nr = array-alloc = 0;
+}
+
 /*
  * Return true iff array already contains an entry with name.
  */
diff --git a/object.h b/object.h
index e028ced..2a755a2 100644
--- a/object.h
+++ b/object.h
@@ -133,6 +133,12 @@ void object_array_filter(struct object_array *array,
  */
 void object_array_remove_duplicates(struct object_array *array);
 
+/*
+ * Remove any objects from the array, freeing all used memory; afterwards
+ * the array is ready to store more objects with add_object_array().
+ */
+void object_array_clear(struct object_array *array);
+
 void clear_object_flags(unsigned flags);
 
 #endif /* OBJECT_H */
-- 
2.1.1.566.gdb1f904

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/16] isxdigit: cast input to unsigned char

2014-10-03 Thread Jeff King
Otherwise, callers must do so or risk triggering
-Wchar-subscript (and rightfully so; a signed char might
cause us to use a bogus negative index into the
hexval_table).

While we are dropping the now-unnecessary casts from the
caller in urlmatch.c, we can get rid of similar casts in
actually parsing the hex by using the hexval() helper, which
implicitly casts to unsigned (but note that we cannot
implement isxdigit in terms of hexval(), as it also casts
its return value to unsigned).

Signed-off-by: Jeff King p...@peff.net
---
 git-compat-util.h | 2 +-
 urlmatch.c| 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 0c4e663..3d5a8fb 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -677,7 +677,7 @@ extern const unsigned char sane_ctype[256];
 #define iscntrl(x) (sane_istest(x,GIT_CNTRL))
 #define ispunct(x) sane_istest(x, GIT_PUNCT | GIT_REGEX_SPECIAL | \
GIT_GLOB_SPECIAL | GIT_PATHSPEC_MAGIC)
-#define isxdigit(x) (hexval_table[x] != -1)
+#define isxdigit(x) (hexval_table[(unsigned char)(x)] != -1)
 #define tolower(x) sane_case((unsigned char)(x), 0x20)
 #define toupper(x) sane_case((unsigned char)(x), 0)
 #define is_pathspec_magic(x) sane_istest(x,GIT_PATHSPEC_MAGIC)
diff --git a/urlmatch.c b/urlmatch.c
index 3d4c54b..618d216 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -43,11 +43,11 @@ static int append_normalized_escapes(struct strbuf *buf,
from_len--;
if (ch == '%') {
if (from_len  2 ||
-   !isxdigit((unsigned char)from[0]) ||
-   !isxdigit((unsigned char)from[1]))
+   !isxdigit(from[0]) ||
+   !isxdigit(from[1]))
return 0;
-   ch = hexval_table[(unsigned char)*from++]  4;
-   ch |= hexval_table[(unsigned char)*from++];
+   ch = hexval(*from++)  4;
+   ch |= hexval(*from++);
from_len -= 2;
was_esc = 1;
}
-- 
2.1.1.566.gdb1f904

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/16] clean up name allocation in prepare_revision_walk

2014-10-03 Thread Jeff King
When we enter prepare_revision_walk, we have zero or more
entries in our pending array. We disconnect that array
from the rev_info, and then process each entry:

  1. If the entry is a commit and the --source option is in
 effect, we keep a pointer to the object name.

  2. Otherwise, we re-add the item to the pending list with
 a blank name.

We then throw away the old array by freeing the array
itself, but do not touch the name field of each entry. For
any items of type (2), we leak the memory associated with
the name. This commit fixes that by calling object_array_clear,
which handles the cleanup for us.

That breaks (1), though, because it depends on the memory
pointed to by the name to last forever. We can solve that by
making a copy of the name. This is slightly less efficient,
but it shouldn't matter in practice, as we do it only for
the tip commits of the traversal.

Signed-off-by: Jeff King p...@peff.net
---
 revision.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/revision.c b/revision.c
index e498b7c..01cc276 100644
--- a/revision.c
+++ b/revision.c
@@ -300,7 +300,7 @@ static struct commit *handle_commit(struct rev_info *revs,
revs-limited = 1;
}
if (revs-show_source  !commit-util)
-   commit-util = (void *) name;
+   commit-util = xstrdup(name);
return commit;
}
 
@@ -2656,15 +2656,16 @@ void reset_revision_walk(void)
 
 int prepare_revision_walk(struct rev_info *revs)
 {
-   int nr = revs-pending.nr;
-   struct object_array_entry *e, *list;
+   int i;
+   struct object_array old_pending;
struct commit_list **next = revs-commits;
 
-   e = list = revs-pending.objects;
+   memcpy(old_pending, revs-pending, sizeof(old_pending));
revs-pending.nr = 0;
revs-pending.alloc = 0;
revs-pending.objects = NULL;
-   while (--nr = 0) {
+   for (i = 0; i  old_pending.nr; i++) {
+   struct object_array_entry *e = old_pending.objects + i;
struct commit *commit = handle_commit(revs, e-item, e-name);
if (commit) {
if (!(commit-object.flags  SEEN)) {
@@ -2672,10 +2673,9 @@ int prepare_revision_walk(struct rev_info *revs)
next = commit_list_append(commit, next);
}
}
-   e++;
}
if (!revs-leak_pending)
-   free(list);
+   object_array_clear(old_pending);
 
/* Signal whether we need per-parent treesame decoration */
if (revs-simplify_merges ||
-- 
2.1.1.566.gdb1f904

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/16] reachable: clear pending array after walking it

2014-10-03 Thread Jeff King
We add a number of objects to our pending array, and then
process it with a combination of get_revision and walking
the pending array ourselves (to catch any non-commits). The
commits in the pending array are cleaned up automatically by
prepare_revision_walk, but we essentially leak any other
objects (they are technically still reachable from rev_info,
but no callers ever look at them or bother to clean them
up).

This is not a huge deal in practice, as the number of
non-commits tends to be small. However, a future patch will
broaden this considerably. Let's call object_array_clear to
free the memory.

Signed-off-by: Jeff King p...@peff.net
---
 reachable.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/reachable.c b/reachable.c
index 6f6835b..d99bd31 100644
--- a/reachable.c
+++ b/reachable.c
@@ -131,6 +131,8 @@ static void walk_commit_list(struct rev_info *revs,
}
die(unknown pending object %s (%s), sha1_to_hex(obj-sha1), 
name);
}
+
+   object_array_clear(revs-pending);
 }
 
 static int add_one_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
-- 
2.1.1.566.gdb1f904

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/24] refs.c: allow listing and deleting badly named refs

2014-10-03 Thread Ronnie Sahlberg
On Thu, Oct 2, 2014 at 11:55 AM, Junio C Hamano gits...@pobox.com wrote:
 Jonathan Nieder jrnie...@gmail.com writes:

 From: Ronnie Sahlberg sahlb...@google.com
 ...
 In resolving functions, refuse to resolve refs that don't pass the
 check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME
 flag is passed.  Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to
 resolve refs that escape the refs/ directory and do not match the
 pattern [A-Z_]* (think HEAD and MERGE_HEAD).

 In locking functions, refuse to act on badly named refs unless they
 are being deleted and either are in the refs/ directory or match [A-Z_]*.

 Just like other invalid refs, flag resolved, badly named refs with the
 REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them
 in all iteration functions except for for_each_rawref.

 Flag badly named refs with a REF_BAD_NAME flag to make it easier for
 future callers to notice and handle them specially.

 In the transaction API, refuse to create or update badly named refs,
 but allow deleting them (unless they escape refs/ and don't match
 [A-Z_]*).

 Signed-off-by: Ronnie Sahlberg sahlb...@google.com
 Signed-off-by: Jonathan Nieder jrnie...@gmail.com

 Thanks.  We originally threw all the different kind of breakages
 into ISBROKEN, but a ref can have a malformed name or can contain a
 bad/non value and allowing us to tell them apart is a good direction
 to go.

 diff --git a/cache.h b/cache.h
 index 5ca7f2b..0c0ac60 100644
 --- a/cache.h
 +++ b/cache.h
 @@ -978,16 +978,26 @@ extern int read_ref(const char *refname, unsigned char 
 *sha1);
   * If flags is non-NULL, set the value that it points to the
   * combination of REF_ISPACKED (if the reference was found among the
   * packed references), REF_ISSYMREF (if the initial reference was a
 - * symbolic reference) and REF_ISBROKEN (if the ref is malformed).
 + * symbolic reference), REF_BAD_NAME (if the reference name is ill
 + * formed --- see RESOLVE_REF_ALLOW_BAD_NAME below), and REF_ISBROKEN
 + * (if the ref is malformed).

 You want to define is malformed here.

 The original defines REF_ISBROKEN as malformed because

  (1) resolve_ref_unsafe() uses get_sha1_hex() and read_loose_refs()
  uses read_ref_full(), both to catch malformed values stored;

  (2) resolve_ref_unsafe() uses check_refname_format() and catches
  malformed names stored as a symref target.

 I _think_ you are introducing ALLOW_BAD_NAME to allow add the third
 class .git/refs/remotes/origin/mal..formed..name.  I do not know
 if they should be the same class as a symref with a good name
 .git/refs/remotes/origin/HEAD that points at a bad name
 mal..formed..name, which is (2) above).  Perhaps not.  (2) is
 still above what is stored in the ref, and the ref in question may
 or may not have a well-formed name, which is orthogonal.

 So probably you left only the value stored in the ref is malformed
 (in other words, we expected to find 40-hex object name but didn't
 find one) case for REF_ISBROKEN?

I updated cache.h to try to clarify it better.
The intention here is to expand the use of REF_ISBROKEN.

For all cases  REF_ISBROKEN will be set. This includes both the sha1
value is bad as well as a name is bad.

For those cases where a name is bad then REF_BAD_NAME is also set. Bad
names are when either the ref itself has a bad name or when a bad name
is encountered while resolving a chain of symbilic refs.


I.e.  REF_BAD_NAME is a special case of broken ref. All REF_BAD_NAME
refs are also REF_ISBROKEN but not the reverse.



 Do we want to separate value is not 40-hex and a symref points at
 a malformed refname as separate malformed value errors?

 + * RESOLVE_REF_ALLOW_BAD_NAME allows resolving refs even when their
 + * name is invalid according to git-check-ref-format(1).  If the name
 + * is bad then the value stored in sha1 will be null_sha1 and the
 + * REF_ISBROKEN and REF_BAD_NAME flags will be set.

 Viewed with that light, I am not sure if a badly named ref should
 yield null_sha1[] (REF_ISBROKEN, which I am assuming is about a
 value that is badly formatted and cannot be read, should keep
 yielding it as before).  Wouldn't it make it harder for the user if
 you give null_sha1[] back to somebody who is trying to recover by
 reading refs/heads/mal..formed, creating refs/heads/sensible to
 point at the same value and then removing the former?

 Note that I am not saying we should give back the parsed value at
 this step in the series.  Perhaps there are some existing callers
 that do not check for ISBROKEN flag and instead says null_sha1[]
 ref is to be rejected/ignored, in which case they may need to be
 corrected before that happens.  Or there may be some reason I
 overlooked that makes it not so useful if we returned the object
 name stored in a ref whose name is malformed.  Just wondering.

The reason for these malformed refs resolving to null_sha1 is exactly
that. There may be callers that do not check ISBROKEN, so all those
callers 

[PATCH 07/16] t5304: use test_path_is_* instead of test -f

2014-10-03 Thread Jeff King
This is slightly more robust (checking ! test -f would not
notice a directory of the same name, though that is not
likely to happen here). It also makes debugging easier, as
the test script will output a message on failure.

Signed-off-by: Jeff King p...@peff.net
---
This patch is totally optional. I did it while debugging t5304 (strange
how badly prune works when you accidentally invert the mtime check!)
and figured it might be worth keeping as a cleanup.

 t/t5304-prune.sh | 46 +++---
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 01c6a3f..b0ffb05 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -14,7 +14,7 @@ add_blob() {
BLOB=$(echo aleph_0 | git hash-object -w --stdin) 
BLOB_FILE=.git/objects/$(echo $BLOB | sed s/^../\//) 
test $((1 + $before)) = $(git count-objects | sed s/ .*//) 
-   test -f $BLOB_FILE 
+   test_path_is_file $BLOB_FILE 
test-chmtime =+0 $BLOB_FILE
 }
 
@@ -35,9 +35,9 @@ test_expect_success 'prune stale packs' '
:  .git/objects/tmp_2.pack 
test-chmtime =-86501 .git/objects/tmp_1.pack 
git prune --expire 1.day 
-   test -f $orig_pack 
-   test -f .git/objects/tmp_2.pack 
-   ! test -f .git/objects/tmp_1.pack
+   test_path_is_file $orig_pack 
+   test_path_is_file .git/objects/tmp_2.pack 
+   test_path_is_missing .git/objects/tmp_1.pack
 
 '
 
@@ -46,11 +46,11 @@ test_expect_success 'prune --expire' '
add_blob 
git prune --expire=1.hour.ago 
test $((1 + $before)) = $(git count-objects | sed s/ .*//) 
-   test -f $BLOB_FILE 
+   test_path_is_file $BLOB_FILE 
test-chmtime =-86500 $BLOB_FILE 
git prune --expire 1.day 
test $before = $(git count-objects | sed s/ .*//) 
-   ! test -f $BLOB_FILE
+   test_path_is_missing $BLOB_FILE
 
 '
 
@@ -60,11 +60,11 @@ test_expect_success 'gc: implicit prune --expire' '
test-chmtime =-$((2*$week-30)) $BLOB_FILE 
git gc 
test $((1 + $before)) = $(git count-objects | sed s/ .*//) 
-   test -f $BLOB_FILE 
+   test_path_is_file $BLOB_FILE 
test-chmtime =-$((2*$week+1)) $BLOB_FILE 
git gc 
test $before = $(git count-objects | sed s/ .*//) 
-   ! test -f $BLOB_FILE
+   test_path_is_missing $BLOB_FILE
 
 '
 
@@ -110,7 +110,7 @@ test_expect_success 'prune: do not prune detached HEAD with 
no reflog' '
git commit --allow-empty -m detached commit 
# verify that there is no reflogs
# (should be removed and disabled by previous test)
-   test ! -e .git/logs 
+   test_path_is_missing .git/logs 
git prune -n prune_actual 
: prune_expected 
test_cmp prune_actual prune_expected
@@ -145,7 +145,7 @@ test_expect_success 'gc --no-prune' '
git config gc.pruneExpire 2.days.ago 
git gc --no-prune 
test 1 = $(git count-objects | sed s/ .*//) 
-   test -f $BLOB_FILE
+   test_path_is_file $BLOB_FILE
 
 '
 
@@ -153,10 +153,10 @@ test_expect_success 'gc respects gc.pruneExpire' '
 
git config gc.pruneExpire 5002.days.ago 
git gc 
-   test -f $BLOB_FILE 
+   test_path_is_file $BLOB_FILE 
git config gc.pruneExpire 5000.days.ago 
git gc 
-   test ! -f $BLOB_FILE
+   test_path_is_missing $BLOB_FILE
 
 '
 
@@ -165,9 +165,9 @@ test_expect_success 'gc --prune=date' '
add_blob 
test-chmtime =-$((5001*$day)) $BLOB_FILE 
git gc --prune=5002.days.ago 
-   test -f $BLOB_FILE 
+   test_path_is_file $BLOB_FILE 
git gc --prune=5000.days.ago 
-   test ! -f $BLOB_FILE
+   test_path_is_missing $BLOB_FILE
 
 '
 
@@ -175,9 +175,9 @@ test_expect_success 'gc --prune=never' '
 
add_blob 
git gc --prune=never 
-   test -f $BLOB_FILE 
+   test_path_is_file $BLOB_FILE 
git gc --prune=now 
-   test ! -f $BLOB_FILE
+   test_path_is_missing $BLOB_FILE
 
 '
 
@@ -186,10 +186,10 @@ test_expect_success 'gc respects gc.pruneExpire=never' '
git config gc.pruneExpire never 
add_blob 
git gc 
-   test -f $BLOB_FILE 
+   test_path_is_file $BLOB_FILE 
git config gc.pruneExpire now 
git gc 
-   test ! -f $BLOB_FILE
+   test_path_is_missing $BLOB_FILE
 
 '
 
@@ -197,9 +197,9 @@ test_expect_success 'prune --expire=never' '
 
add_blob 
git prune --expire=never 
-   test -f $BLOB_FILE 
+   test_path_is_file $BLOB_FILE 
git prune 
-   test ! -f $BLOB_FILE
+   test_path_is_missing $BLOB_FILE
 
 '
 
@@ -210,10 +210,10 @@ test_expect_success 'gc: prune old objects after local 
clone' '
(
cd aclone 
test 1 = $(git count-objects | sed s/ .*//) 
-   test -f $BLOB_FILE 
+   test_path_is_file $BLOB_FILE 
git gc --prune 
  

[PATCH 08/16] t5304: use helper to report failure of test foo = bar

2014-10-03 Thread Jeff King
For small outputs, we sometimes use:

  test $(some_cmd) = something we expect

instead of a full test_cmp. The downside of this is that
when it fails, there is no output at all from the script.
Let's introduce a small helper to make tests easier to
debug.

Signed-off-by: Jeff King p...@peff.net
---
This is in the same boat as the last commit; we can drop it without
hurting the rest of the series.

Is test_eq too cutesy or obfuscated? I have often wanted it when
debugging other tests, too. Our usual technique is to do:

  echo whatever expect 
  do_something actual 
  test_cmp expect actual

That's a bit verbose. We could hide it behind something like test_eq,
too, but it introduces several extra new processes. And I know people on
some fork-challenged platforms are very sensitive to the number of
spawned processes in the test suite.

 t/t5304-prune.sh| 16 
 t/test-lib-functions.sh | 11 +++
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index b0ffb05..502860e 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -13,7 +13,7 @@ add_blob() {
before=$(git count-objects | sed s/ .*//) 
BLOB=$(echo aleph_0 | git hash-object -w --stdin) 
BLOB_FILE=.git/objects/$(echo $BLOB | sed s/^../\//) 
-   test $((1 + $before)) = $(git count-objects | sed s/ .*//) 
+   test_eq $((1 + $before)) $(git count-objects | sed s/ .*//) 
test_path_is_file $BLOB_FILE 
test-chmtime =+0 $BLOB_FILE
 }
@@ -45,11 +45,11 @@ test_expect_success 'prune --expire' '
 
add_blob 
git prune --expire=1.hour.ago 
-   test $((1 + $before)) = $(git count-objects | sed s/ .*//) 
+   test_eq $((1 + $before)) $(git count-objects | sed s/ .*//) 
test_path_is_file $BLOB_FILE 
test-chmtime =-86500 $BLOB_FILE 
git prune --expire 1.day 
-   test $before = $(git count-objects | sed s/ .*//) 
+   test_eq $before $(git count-objects | sed s/ .*//) 
test_path_is_missing $BLOB_FILE
 
 '
@@ -59,11 +59,11 @@ test_expect_success 'gc: implicit prune --expire' '
add_blob 
test-chmtime =-$((2*$week-30)) $BLOB_FILE 
git gc 
-   test $((1 + $before)) = $(git count-objects | sed s/ .*//) 
+   test_eq $((1 + $before)) $(git count-objects | sed s/ .*//) 
test_path_is_file $BLOB_FILE 
test-chmtime =-$((2*$week+1)) $BLOB_FILE 
git gc 
-   test $before = $(git count-objects | sed s/ .*//) 
+   test_eq $before $(git count-objects | sed s/ .*//) 
test_path_is_missing $BLOB_FILE
 
 '
@@ -144,7 +144,7 @@ test_expect_success 'gc --no-prune' '
test-chmtime =-$((5001*$day)) $BLOB_FILE 
git config gc.pruneExpire 2.days.ago 
git gc --no-prune 
-   test 1 = $(git count-objects | sed s/ .*//) 
+   test_eq 1 $(git count-objects | sed s/ .*//) 
test_path_is_file $BLOB_FILE
 
 '
@@ -209,10 +209,10 @@ test_expect_success 'gc: prune old objects after local 
clone' '
git clone --no-hardlinks . aclone 
(
cd aclone 
-   test 1 = $(git count-objects | sed s/ .*//) 
+   test_eq 1 $(git count-objects | sed s/ .*//) 
test_path_is_file $BLOB_FILE 
git gc --prune 
-   test 0 = $(git count-objects | sed s/ .*//) 
+   test_eq 0 $(git count-objects | sed s/ .*//) 
test_path_is_missing $BLOB_FILE
)
 '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index dafd6ad..0a17614 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -634,6 +634,17 @@ test_cmp_bin() {
cmp $@
 }
 
+# This is the same as 'test $1 $3 $2' except that it
+# will output a useful message to stderr on failure. If
+# $3 is omitted, defaults to =.
+test_eq () {
+   if ! test $1 ${3:-=} $2
+   then
+   echo 2 test_eq failed: $1 ${3:-=} $2
+   false
+   fi
+}
+
 # Check if the file expected to be empty is indeed empty, and barfs
 # otherwise.
 
-- 
2.1.1.566.gdb1f904

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/16] prune: factor out loose-object directory traversal

2014-10-03 Thread Jeff King
Prune has to walk $GIT_DIR/objects/?? in order to find the
set of loose objects to prune. Other parts of the code
(e.g., count-objects) want to do the same. Let's factor it
out into a reusable for_each-style function.

Note that this is not quite a straight code movement. There
are two differences:

  1. The original code iterated from 0 to 256, trying to
 opendir($GIT_DIR/%02x). The new code just does a
 readdir() on the object directory, and descends into
 any matching directories. This is faster on
 already-pruned repositories, and should not ever be
 slower (nobody ever creates other files in the object
 directory).

  2. The original code had strange behavior when it found a
 file of the form [0-9a-f]{2}/.{38} that did _not_
 contain all hex digits. It executed a break from the
 loop, meaning that we stopped pruning in that directory
 (but still pruned other directories!). This was
 probably a bug; we do not want to process the file as
 an object, but we should keep going otherwise.

Signed-off-by: Jeff King p...@peff.net
---
I admit the speedup in (1) almost certainly doesn't matter. It is real,
and I found out about it while writing a different program that was
basically count-objects across a large number of repositories. However
for a single repo it's probably not big enough to matter (calling
count-objects in a loop while get dominated by the startup costs). The
end result is a little more obvious IMHO, but that's subjective.

 builtin/prune.c | 87 
 cache.h | 31 +++
 sha1_file.c | 95 +
 3 files changed, 152 insertions(+), 61 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index 144a3bd..8286680 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -31,11 +31,23 @@ static int prune_tmp_file(const char *fullpath)
return 0;
 }
 
-static int prune_object(const char *fullpath, const unsigned char *sha1)
+static int prune_object(const unsigned char *sha1, const char *fullpath,
+   void *data)
 {
struct stat st;
-   if (lstat(fullpath, st))
-   return error(Could not stat '%s', fullpath);
+
+   /*
+* Do we know about this object?
+* It must have been reachable
+*/
+   if (lookup_object(sha1))
+   return 0;
+
+   if (lstat(fullpath, st)) {
+   /* report errors, but do not stop pruning */
+   error(Could not stat '%s', fullpath);
+   return 0;
+   }
if (st.st_mtime  expire)
return 0;
if (show_only || verbose) {
@@ -48,68 +60,20 @@ static int prune_object(const char *fullpath, const 
unsigned char *sha1)
return 0;
 }
 
-static int prune_dir(int i, struct strbuf *path)
+static int prune_cruft(const char *basename, const char *path, void *data)
 {
-   size_t baselen = path-len;
-   DIR *dir = opendir(path-buf);
-   struct dirent *de;
-
-   if (!dir)
-   return 0;
-
-   while ((de = readdir(dir)) != NULL) {
-   char name[100];
-   unsigned char sha1[20];
-
-   if (is_dot_or_dotdot(de-d_name))
-   continue;
-   if (strlen(de-d_name) == 38) {
-   sprintf(name, %02x, i);
-   memcpy(name+2, de-d_name, 39);
-   if (get_sha1_hex(name, sha1)  0)
-   break;
-
-   /*
-* Do we know about this object?
-* It must have been reachable
-*/
-   if (lookup_object(sha1))
-   continue;
-
-   strbuf_addf(path, /%s, de-d_name);
-   prune_object(path-buf, sha1);
-   strbuf_setlen(path, baselen);
-   continue;
-   }
-   if (starts_with(de-d_name, tmp_obj_)) {
-   strbuf_addf(path, /%s, de-d_name);
-   prune_tmp_file(path-buf);
-   strbuf_setlen(path, baselen);
-   continue;
-   }
-   fprintf(stderr, bad sha1 file: %s/%s\n, path-buf, 
de-d_name);
-   }
-   closedir(dir);
-   if (!show_only)
-   rmdir(path-buf);
+   if (starts_with(basename, tmp_obj_))
+   prune_tmp_file(path);
+   else
+   fprintf(stderr, bad sha1 file: %s\n, path);
return 0;
 }
 
-static void prune_object_dir(const char *path)
+static int prune_subdir(const char *basename, const char *path, void *data)
 {
-   struct strbuf buf = STRBUF_INIT;
-   size_t baselen;
-   int i;
-
-   strbuf_addstr(buf, path);
-   strbuf_addch(buf, '/');
-   baselen = buf.len;
-
-   for (i = 0; i  

[PATCH 10/16] count-objects: do not use xsize_t when counting object size

2014-10-03 Thread Jeff King
The point of xsize_t is to safely cast an off_t into a size_t
(because we are about to mmap). But in count-objects, we are
summing the sizes in an off_t. Using xsize_t means that
count-objects could fail on a 32-bit system with a 4G
object (not likely, as other parts of git would fail, but
we should at least be correct here).

Signed-off-by: Jeff King p...@peff.net
---
I think the on_disk_bytes is a little weird here, too. We count actual
disk-usage blocks for loose objects here, which makes sense. But we do
_not_ do so for packfiles, or for garbage files. Which seems kind of
inconsistent.

I kind of doubt anybody cares too much either way, though.

 builtin/count-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index a7f70cb..316a805 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -53,7 +53,7 @@ static void count_objects(DIR *d, char *path, int len, int 
verbose,
if (lstat(path, st) || !S_ISREG(st.st_mode))
bad = 1;
else
-   (*loose_size) += xsize_t(on_disk_bytes(st));
+   (*loose_size) += on_disk_bytes(st);
}
if (bad) {
if (verbose) {
-- 
2.1.1.566.gdb1f904

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/16] count-objects: use for_each_loose_file_in_objdir

2014-10-03 Thread Jeff King
This drops our line count considerably, and should make
things more readable by keeping the counting logic separate
from the traversal.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/count-objects.c | 101 ++--
 1 file changed, 30 insertions(+), 71 deletions(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index 316a805..e47ef0b 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -11,6 +11,9 @@
 
 static unsigned long garbage;
 static off_t size_garbage;
+static int verbose;
+static unsigned long loose, packed, packed_loose;
+static off_t loose_size;
 
 static void real_report_garbage(const char *desc, const char *path)
 {
@@ -21,61 +24,31 @@ static void real_report_garbage(const char *desc, const 
char *path)
garbage++;
 }
 
-static void count_objects(DIR *d, char *path, int len, int verbose,
- unsigned long *loose,
- off_t *loose_size,
- unsigned long *packed_loose)
+static void loose_garbage(const char *path)
 {
-   struct dirent *ent;
-   while ((ent = readdir(d)) != NULL) {
-   char hex[41];
-   unsigned char sha1[20];
-   const char *cp;
-   int bad = 0;
+   if (verbose)
+   report_garbage(garbage found, path);
+}
 
-   if (is_dot_or_dotdot(ent-d_name))
-   continue;
-   for (cp = ent-d_name; *cp; cp++) {
-   int ch = *cp;
-   if (('0' = ch  ch = '9') ||
-   ('a' = ch  ch = 'f'))
-   continue;
-   bad = 1;
-   break;
-   }
-   if (cp - ent-d_name != 38)
-   bad = 1;
-   else {
-   struct stat st;
-   memcpy(path + len + 3, ent-d_name, 38);
-   path[len + 2] = '/';
-   path[len + 41] = 0;
-   if (lstat(path, st) || !S_ISREG(st.st_mode))
-   bad = 1;
-   else
-   (*loose_size) += on_disk_bytes(st);
-   }
-   if (bad) {
-   if (verbose) {
-   struct strbuf sb = STRBUF_INIT;
-   strbuf_addf(sb, %.*s/%s,
-   len + 2, path, ent-d_name);
-   report_garbage(garbage found, sb.buf);
-   strbuf_release(sb);
-   }
-   continue;
-   }
-   (*loose)++;
-   if (!verbose)
-   continue;
-   memcpy(hex, path+len, 2);
-   memcpy(hex+2, ent-d_name, 38);
-   hex[40] = 0;
-   if (get_sha1_hex(hex, sha1))
-   die(internal error);
-   if (has_sha1_pack(sha1))
-   (*packed_loose)++;
+static int count_loose(const unsigned char *sha1, const char *path, void *data)
+{
+   struct stat st;
+
+   if (lstat(path, st) || !S_ISREG(st.st_mode))
+   loose_garbage(path);
+   else {
+   loose_size += on_disk_bytes(st);
+   loose++;
+   if (verbose  has_sha1_pack(sha1))
+   packed_loose++;
}
+   return 0;
+}
+
+static int count_cruft(const char *basename, const char *path, void *data)
+{
+   loose_garbage(path);
+   return 0;
 }
 
 static char const * const count_objects_usage[] = {
@@ -85,12 +58,7 @@ static char const * const count_objects_usage[] = {
 
 int cmd_count_objects(int argc, const char **argv, const char *prefix)
 {
-   int i, verbose = 0, human_readable = 0;
-   const char *objdir = get_object_directory();
-   int len = strlen(objdir);
-   char *path = xmalloc(len + 50);
-   unsigned long loose = 0, packed = 0, packed_loose = 0;
-   off_t loose_size = 0;
+   int human_readable = 0;
struct option opts[] = {
OPT__VERBOSE(verbose, N_(be verbose)),
OPT_BOOL('H', human-readable, human_readable,
@@ -104,19 +72,10 @@ int cmd_count_objects(int argc, const char **argv, const 
char *prefix)
usage_with_options(count_objects_usage, opts);
if (verbose)
report_garbage = real_report_garbage;
-   memcpy(path, objdir, len);
-   if (len  objdir[len-1] != '/')
-   path[len++] = '/';
-   for (i = 0; i  256; i++) {
-   DIR *d;
-   sprintf(path + len, %02x, i);
-   d = opendir(path);
-   if (!d)
-   continue;
-   count_objects(d, path, len, verbose,
- loose, loose_size, packed_loose);
-

[PATCH 12/16] sha1_file: add for_each iterators for loose and packed objects

2014-10-03 Thread Jeff King
We typically iterate over the reachable objects in a
repository by starting at the tips and walking the graph.
There's no easy way to iterate over all of the objects,
including unreachable ones. Let's provide a way of doing so.

Signed-off-by: Jeff King p...@peff.net
---
 cache.h | 11 +++
 sha1_file.c | 62 +
 2 files changed, 73 insertions(+)

diff --git a/cache.h b/cache.h
index 7abe7f6..3826b4b 100644
--- a/cache.h
+++ b/cache.h
@@ -1270,6 +1270,17 @@ int for_each_loose_file_in_objdir(const char *path,
  each_loose_subdir_fn subdir_cb,
  void *data);
 
+/*
+ * Iterate over loose and packed objects in both the local
+ * repository and any alternates repositories.
+ */
+typedef int each_packed_object_fn(const unsigned char *sha1,
+ struct packed_git *pack,
+ uint32_t pos,
+ void *data);
+extern int for_each_loose_object(each_loose_object_fn, void *);
+extern int for_each_packed_object(each_packed_object_fn, void *);
+
 struct object_info {
/* Request */
enum object_type *typep;
diff --git a/sha1_file.c b/sha1_file.c
index 9fdad47..d017289 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3313,3 +3313,65 @@ int for_each_loose_file_in_objdir(const char *path,
strbuf_release(buf);
return r;
 }
+
+struct loose_alt_odb_data {
+   each_loose_object_fn *cb;
+   void *data;
+};
+
+static int loose_from_alt_odb(struct alternate_object_database *alt,
+ void *vdata)
+{
+   struct loose_alt_odb_data *data = vdata;
+   return for_each_loose_file_in_objdir(alt-base,
+data-cb, NULL, NULL,
+data-data);
+}
+
+int for_each_loose_object(each_loose_object_fn cb, void *data)
+{
+   struct loose_alt_odb_data alt;
+   int r;
+
+   r = for_each_loose_file_in_objdir(get_object_directory(),
+ cb, NULL, NULL, data);
+   if (r)
+   return r;
+
+   alt.cb = cb;
+   alt.data = data;
+   return foreach_alt_odb(loose_from_alt_odb, alt);
+}
+
+int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, 
void *data)
+{
+   uint32_t i;
+   int r = 0;
+
+   for (i = 0; i  p-num_objects; i++) {
+   const unsigned char *sha1 = nth_packed_object_sha1(p, i);
+
+   if (!sha1)
+   return error(unable to get sha1 of object %u in %s,
+i, p-pack_name);
+
+   r = cb(sha1, p, i, data);
+   if (r)
+   break;
+   }
+   return r;
+}
+
+int for_each_packed_object(each_packed_object_fn cb, void *data)
+{
+   struct packed_git *p;
+   int r = 0;
+
+   prepare_packed_git();
+   for (p = packed_git; p; p = p-next) {
+   r = for_each_object_in_pack(p, cb, data);
+   if (r)
+   break;
+   }
+   return 0;
+}
-- 
2.1.1.566.gdb1f904

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/24] refs.c: allow listing and deleting badly named refs

2014-10-03 Thread Ronnie Sahlberg
On Fri, Oct 3, 2014 at 1:25 PM, Ronnie Sahlberg sahlb...@google.com wrote:
 On Thu, Oct 2, 2014 at 11:55 AM, Junio C Hamano gits...@pobox.com wrote:
 Jonathan Nieder jrnie...@gmail.com writes:

 From: Ronnie Sahlberg sahlb...@google.com
 ...
 In resolving functions, refuse to resolve refs that don't pass the
 check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME
 flag is passed.  Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to
 resolve refs that escape the refs/ directory and do not match the
 pattern [A-Z_]* (think HEAD and MERGE_HEAD).

 In locking functions, refuse to act on badly named refs unless they
 are being deleted and either are in the refs/ directory or match [A-Z_]*.

 Just like other invalid refs, flag resolved, badly named refs with the
 REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them
 in all iteration functions except for for_each_rawref.

 Flag badly named refs with a REF_BAD_NAME flag to make it easier for
 future callers to notice and handle them specially.

 In the transaction API, refuse to create or update badly named refs,
 but allow deleting them (unless they escape refs/ and don't match
 [A-Z_]*).

 Signed-off-by: Ronnie Sahlberg sahlb...@google.com
 Signed-off-by: Jonathan Nieder jrnie...@gmail.com

 Thanks.  We originally threw all the different kind of breakages
 into ISBROKEN, but a ref can have a malformed name or can contain a
 bad/non value and allowing us to tell them apart is a good direction
 to go.

 diff --git a/cache.h b/cache.h
 index 5ca7f2b..0c0ac60 100644
 --- a/cache.h
 +++ b/cache.h
 @@ -978,16 +978,26 @@ extern int read_ref(const char *refname, unsigned 
 char *sha1);
   * If flags is non-NULL, set the value that it points to the
   * combination of REF_ISPACKED (if the reference was found among the
   * packed references), REF_ISSYMREF (if the initial reference was a
 - * symbolic reference) and REF_ISBROKEN (if the ref is malformed).
 + * symbolic reference), REF_BAD_NAME (if the reference name is ill
 + * formed --- see RESOLVE_REF_ALLOW_BAD_NAME below), and REF_ISBROKEN
 + * (if the ref is malformed).

 You want to define is malformed here.

 The original defines REF_ISBROKEN as malformed because

  (1) resolve_ref_unsafe() uses get_sha1_hex() and read_loose_refs()
  uses read_ref_full(), both to catch malformed values stored;

  (2) resolve_ref_unsafe() uses check_refname_format() and catches
  malformed names stored as a symref target.

 I _think_ you are introducing ALLOW_BAD_NAME to allow add the third
 class .git/refs/remotes/origin/mal..formed..name.  I do not know
 if they should be the same class as a symref with a good name
 .git/refs/remotes/origin/HEAD that points at a bad name
 mal..formed..name, which is (2) above).  Perhaps not.  (2) is
 still above what is stored in the ref, and the ref in question may
 or may not have a well-formed name, which is orthogonal.

 So probably you left only the value stored in the ref is malformed
 (in other words, we expected to find 40-hex object name but didn't
 find one) case for REF_ISBROKEN?

 I updated cache.h to try to clarify it better.
 The intention here is to expand the use of REF_ISBROKEN.

 For all cases  REF_ISBROKEN will be set. This includes both the sha1
 value is bad as well as a name is bad.

 For those cases where a name is bad then REF_BAD_NAME is also set. Bad
 names are when either the ref itself has a bad name or when a bad name
 is encountered while resolving a chain of symbilic refs.


 I.e.  REF_BAD_NAME is a special case of broken ref. All REF_BAD_NAME
 refs are also REF_ISBROKEN but not the reverse.

Let me add some rationale why REF_BAD_NAME and REF_ISBROKEN are
notorthogonal properties.

I think almost all callers of these APIs are only concerned about is
the ref good or bad and they today only check REF_ISBROKEN.
I think that is a reasonable API and it allows the majority of callers
to just check this single flag.
(The alternative would be to keep all callers in sync and use a set of
flags for all bad conditions.)

A very small subset of callers are actually interested in knowing why
the ref was bad, and in particular if the ref was bad due to a name
component.
Those callers, that are aware that there are different types of
ISBROKEN can then inspect the REF_BAD_NAME flag in order to decide
is the ref broken due to the ref name?.


This is why I made REF_BAD_NAME a special case of REF_ISBROKEN.






 Do we want to separate value is not 40-hex and a symref points at
 a malformed refname as separate malformed value errors?

 + * RESOLVE_REF_ALLOW_BAD_NAME allows resolving refs even when their
 + * name is invalid according to git-check-ref-format(1).  If the name
 + * is bad then the value stored in sha1 will be null_sha1 and the
 + * REF_ISBROKEN and REF_BAD_NAME flags will be set.

 Viewed with that light, I am not sure if a badly named ref should
 yield null_sha1[] (REF_ISBROKEN, which I am assuming is about a
 value that is badly 

[PATCH 15/16] pack-objects: match prune logic for discarding objects

2014-10-03 Thread Jeff King
A recent commit taught git-prune to keep non-recent objects
that are reachable from recent ones. However, pack-objects,
when loosening unreachable objects, tries to optimize out
the write in the case that the object will be immediately
pruned. It now gets this wrong, since its rule does not
reflect the new prune code (and this can be seen by running
t6501 with a strategically placed repack).

Let's teach pack-objects similar logic.

Signed-off-by: Jeff King p...@peff.net
---
The test changes look big because of the indentation. View with -w for
a sane diff.

 builtin/pack-objects.c | 38 +++
 reachable.c|  4 +-
 reachable.h|  2 +
 t/t6501-freshen-objects.sh | 93 +++---
 4 files changed, 97 insertions(+), 40 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 2fe2ab0..1db5ea5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -20,6 +20,8 @@
 #include streaming.h
 #include thread-utils.h
 #include pack-bitmap.h
+#include reachable.h
+#include sha1-array.h
 
 static const char *pack_usage[] = {
N_(git pack-objects --stdout [options...] [ ref-list |  
object-list]),
@@ -2407,6 +2409,15 @@ static int has_sha1_pack_kept_or_nonlocal(const unsigned 
char *sha1)
return 0;
 }
 
+/*
+ * Store a list of sha1s that are should not be discarded
+ * because they are either written too recently, or are
+ * reachable from another object that was.
+ *
+ * This is filled by get_object_list.
+ */
+static struct sha1_array recent_objects;
+
 static int loosened_object_can_be_discarded(const unsigned char *sha1,
unsigned long mtime)
 {
@@ -2414,6 +2425,8 @@ static int loosened_object_can_be_discarded(const 
unsigned char *sha1,
return 0;
if (mtime  unpack_unreachable_expiration)
return 0;
+   if (sha1_array_lookup(recent_objects, sha1) = 0)
+   return 0;
return 1;
 }
 
@@ -2470,6 +2483,19 @@ static int get_object_list_from_bitmap(struct rev_info 
*revs)
return 0;
 }
 
+static void record_recent_object(struct object *obj,
+const struct name_path *path,
+const char *last,
+void *data)
+{
+   sha1_array_append(recent_objects, obj-sha1);
+}
+
+static void record_recent_commit(struct commit *commit, void *data)
+{
+   sha1_array_append(recent_objects, commit-object.sha1);
+}
+
 static void get_object_list(int ac, const char **av)
 {
struct rev_info revs;
@@ -2517,10 +2543,22 @@ static void get_object_list(int ac, const char **av)
mark_edges_uninteresting(revs, show_edge);
traverse_commit_list(revs, show_commit, show_object, NULL);
 
+   if (unpack_unreachable_expiration) {
+   if (add_unseen_recent_objects_to_traversal(revs,
+   unpack_unreachable_expiration))
+   die(unable to add recent objects);
+   if (prepare_revision_walk(revs))
+   die(revision walk setup failed);
+   traverse_commit_list(revs, record_recent_commit,
+record_recent_object, NULL);
+   }
+
if (keep_unreachable)
add_objects_in_unpacked_packs(revs);
if (unpack_unreachable)
loosen_unused_packed_objects(revs);
+
+   sha1_array_clear(recent_objects);
 }
 
 static int option_parse_index_version(const struct option *opt,
diff --git a/reachable.c b/reachable.c
index f443265..5e35f21 100644
--- a/reachable.c
+++ b/reachable.c
@@ -298,8 +298,8 @@ static int add_recent_packed(const unsigned char *sha1,
return 0;
 }
 
-static int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
- unsigned long timestamp)
+int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
+  unsigned long timestamp)
 {
struct recent_data data;
int r;
diff --git a/reachable.h b/reachable.h
index 141fe30..d23efc3 100644
--- a/reachable.h
+++ b/reachable.h
@@ -2,6 +2,8 @@
 #define REACHEABLE_H
 
 struct progress;
+extern int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
+ unsigned long timestamp);
 extern void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
   unsigned long mark_recent, struct progress 
*);
 
diff --git a/t/t6501-freshen-objects.sh b/t/t6501-freshen-objects.sh
index de941c2..e25c47d 100755
--- a/t/t6501-freshen-objects.sh
+++ b/t/t6501-freshen-objects.sh
@@ -39,50 +39,67 @@ commit () {
git commit -m $1
 }
 
-test_expect_success 'disable reflogs' '
-   git config core.logallrefupdates false 
-   rm -rf .git/logs
-'
+maybe_repack () {
+   if 

[PATCH 14/16] pack-objects: refactor unpack-unreachable expiration check

2014-10-03 Thread Jeff King
When we are loosening unreachable packed objects, we do not
bother to process objects that would simply be pruned
immediately anyway. The would be pruned check is a simple
comparison, but is about to get more complicated. Let's pull
it out into a separate function.

Note that this is slightly less efficient than the original,
which avoided even opening old packs, since no object in
them could pass the current check, which cares only about
the pack mtime.  But the new rules will depend on the exact
object, so we need to perform the check even for old packs.

Note also that we fix a minor buglet when the pack mtime is
exactly the same as the expiration time. The prune code
considers that worth pruning, whereas our check here
considered it worth keeping. This wasn't a big deal. Besides
being unlikely to happen, the result was simply that the
object was loosened and then pruned, missing the
optimization. Still, we can easily fix it while we are here.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/pack-objects.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d391934..2fe2ab0 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2407,6 +2407,16 @@ static int has_sha1_pack_kept_or_nonlocal(const unsigned 
char *sha1)
return 0;
 }
 
+static int loosened_object_can_be_discarded(const unsigned char *sha1,
+   unsigned long mtime)
+{
+   if (!unpack_unreachable_expiration)
+   return 0;
+   if (mtime  unpack_unreachable_expiration)
+   return 0;
+   return 1;
+}
+
 static void loosen_unused_packed_objects(struct rev_info *revs)
 {
struct packed_git *p;
@@ -2417,17 +2427,14 @@ static void loosen_unused_packed_objects(struct 
rev_info *revs)
if (!p-pack_local || p-pack_keep)
continue;
 
-   if (unpack_unreachable_expiration 
-   p-mtime  unpack_unreachable_expiration)
-   continue;
-
if (open_pack_index(p))
die(cannot open pack index);
 
for (i = 0; i  p-num_objects; i++) {
sha1 = nth_packed_object_sha1(p, i);
if (!packlist_find(to_pack, sha1, NULL) 
-   !has_sha1_pack_kept_or_nonlocal(sha1))
+   !has_sha1_pack_kept_or_nonlocal(sha1) 
+   !loosened_object_can_be_discarded(sha1, p-mtime))
if (force_object_loose(sha1, p-mtime))
die(unable to force loose object);
}
-- 
2.1.1.566.gdb1f904

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/24] refs.c: allow listing and deleting badly named refs

2014-10-03 Thread Junio C Hamano
Ronnie Sahlberg sahlb...@google.com writes:

 On Thu, Oct 2, 2014 at 11:55 AM, Junio C Hamano gits...@pobox.com wrote:
 ...
 Thanks.  We originally threw all the different kind of breakages
 into ISBROKEN, but a ref can have a malformed name or can contain a
 bad/non value and allowing us to tell them apart is a good direction
 to go.
 ...

 I updated cache.h to try to clarify it better.
 The intention here is to expand the use of REF_ISBROKEN.

 For all cases  REF_ISBROKEN will be set. This includes both the sha1
 value is bad as well as a name is bad.

OK.  As long as it is documented in a way to help other people who
touch the code later to tell what these REF_* mean, either way is
fine between BADNAME is just one of the possible ways a ref is
ISBROKEN and BADNAME is a way and ISBROKEN is another way for a
ref to be bad, and I agree that the former (i.e. the way you
defined) is easier on the existing code.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/16] prune: keep objects reachable from recent objects

2014-10-03 Thread Jeff King
Our current strategy with prune is that an object falls into
one of three categories:

  1. Reachable (from ref tips, reflogs, index, etc).

  2. Not reachable, but recent (based on the --expire time
 and the file's mtime).

  3. Not reachable and not recent.

We keep objects from (1) and (2), but prune objects in (3).
The point of (2) is that these objects may be part of an
in-progress operation that has not yet updated any refs.

However, it is not always the case that objects for an
in-progress operation will have a recent mtime. For example,
the object database may have an old copy of a blob (from an
abandoned operation, a branch that was deleted, etc). If we
create a new tree that points to it, a simultaneous prune
will leave our tree, but delete the blob. Referencing that
tree with a commit will then work (we check that the tree is
in the object database, but not that all of its referred
objects are), as will mentioning the commit in a ref. But
the resulting repo is corrupt; we are missing the blob
reachable from a ref.

One way to solve this is to be more thorough when
referencing a sha1: make sure that not only do we have that
sha1, but that we have the objects it refers to, and so
forth recursively. The problem is that this is very
expensive.  Creating a parent link would require traversing
the entire object graph down to the roots.

Instead, this patch pushes the extra work onto prune, which
runs less frequently (and has to look at the whole object
graph anyway). It creates a new category of objects: objects
which are not recent, but which are reachable from a recent
object. We do not prune these objects, just like the
reachable and recent ones.

This lets us avoid the recursive check above, because if we
have an object, even if it is unreachable, we should have
its referent:

  - if we are creating new objects, then we cannot create
the parent object without having the child

  - and if we are pruning objects, will not prune the child
if we are keeping the parent

The big exception would be if one were to write the object
in a way that avoided referential integrity (e.g., using
hash-object). But if you are in the habit of doing that, you
deserve what you get.

Naively, the simplest way to implement this would be to add
all recent objects as tips to the reachability traversal.
However, this does not perform well. In a recently-packed
repository, all reachable objects will also be recent, and
therefore we have to consider each object twice (both as a
tip, and when we reach it in the traversal). I tested this,
and it added about 10s to a 30s prune on linux.git. This
patch instead performs the normal reachability traversal
first, then follows up with a second traversal for recent
objects, skipping any that have already been marked.

Signed-off-by: Jeff King p...@peff.net
---
I put the mark-recent code into mark_reachable_objects here,
but it does not technically have to be there. It reuses the
same rev_info object (which is convenient), but the SEEN
flags from the first traversal are marked on the global
commit objects themselves. So we could break it out into a
separate function.

However, we'd have to refactor the progress reporting; the
numbers are kept internally to mark_reachable, and we would
want to continue them for the second traversal (though I
suppose you could start a second progress meter with
Checking recent objects or something if you wanted).

 builtin/prune.c|   2 +-
 builtin/reflog.c   |   2 +-
 reachable.c| 111 +
 reachable.h|   3 +-
 t/t6501-freshen-objects.sh |  88 +++
 5 files changed, 203 insertions(+), 3 deletions(-)
 create mode 100755 t/t6501-freshen-objects.sh

diff --git a/builtin/prune.c b/builtin/prune.c
index 8286680..a965574 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -135,7 +135,7 @@ int cmd_prune(int argc, const char **argv, const char 
*prefix)
if (show_progress)
progress = start_progress_delay(_(Checking connectivity), 0, 
0, 2);
 
-   mark_reachable_objects(revs, 1, progress);
+   mark_reachable_objects(revs, 1, expire, progress);
stop_progress(progress);
for_each_loose_file_in_objdir(get_object_directory(), prune_object,
  prune_cruft, prune_subdir, NULL);
diff --git a/builtin/reflog.c b/builtin/reflog.c
index e8a8fb1..80bddc2 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -649,7 +649,7 @@ static int cmd_reflog_expire(int argc, const char **argv, 
const char *prefix)
init_revisions(cb.revs, prefix);
if (cb.verbose)
printf(Marking reachable objects...);
-   mark_reachable_objects(cb.revs, 0, NULL);
+   mark_reachable_objects(cb.revs, 0, 0, NULL);
if (cb.verbose)
putchar('\n');
}
diff --git 

[PATCH 16/16] write_sha1_file: freshen existing objects

2014-10-03 Thread Jeff King
When we try to write a loose object file, we first check
whether that object already exists. If so, we skip the
write as an optimization. However, this can interfere with
prune's strategy of using mtimes to mark files in progress.

For example, if a branch contains a particular tree object
and is deleted, that tree object may become unreachable, and
have an old mtime. If a new operation then tries to write
the same tree, this ends up as a noop; we notice we
already have the object and do nothing. A prune running
simultaneously with this operation will see the object as
old, and may delete it.

We can solve this by freshening objects that we avoid
writing by updating their mtime. The algorithm for doing so
is essentially the same as that of has_sha1_file. Therefore
we provide a new (static) interface check_and_freshen,
which finds and optionally freshens the object. It's trivial
to implement freshening and simple checking by tweaking a
single parameter.

Signed-off-by: Jeff King p...@peff.net
---
 sha1_file.c| 51 +++---
 t/t6501-freshen-objects.sh | 27 
 2 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index d017289..d263b38 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -442,27 +442,53 @@ void prepare_alt_odb(void)
read_info_alternates(get_object_directory(), 0);
 }
 
-static int has_loose_object_local(const unsigned char *sha1)
+static int freshen_file(const char *fn)
 {
-   return !access(sha1_file_name(sha1), F_OK);
+   struct utimbuf t;
+   t.actime = t.modtime = time(NULL);
+   return utime(fn, t);
 }
 
-int has_loose_object_nonlocal(const unsigned char *sha1)
+static int check_and_freshen_file(const char *fn, int freshen)
+{
+   if (access(fn, F_OK))
+   return 0;
+   if (freshen  freshen_file(fn))
+   return 0;
+   return 1;
+}
+
+static int check_and_freshen_local(const unsigned char *sha1, int freshen)
+{
+   return check_and_freshen_file(sha1_file_name(sha1), freshen);
+}
+
+static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
 {
struct alternate_object_database *alt;
prepare_alt_odb();
for (alt = alt_odb_list; alt; alt = alt-next) {
fill_sha1_path(alt-name, sha1);
-   if (!access(alt-base, F_OK))
+   if (check_and_freshen_file(alt-base, freshen))
return 1;
}
return 0;
 }
 
+static int check_and_freshen(const unsigned char *sha1, int freshen)
+{
+   return check_and_freshen_local(sha1, freshen) ||
+  check_and_freshen_nonlocal(sha1, freshen);
+}
+
+int has_loose_object_nonlocal(const unsigned char *sha1)
+{
+   return check_and_freshen_nonlocal(sha1, 0);
+}
+
 static int has_loose_object(const unsigned char *sha1)
 {
-   return has_loose_object_local(sha1) ||
-  has_loose_object_nonlocal(sha1);
+   return check_and_freshen(sha1, 0);
 }
 
 static unsigned int pack_used_ctr;
@@ -2949,6 +2975,17 @@ static int write_loose_object(const unsigned char *sha1, 
char *hdr, int hdrlen,
return move_temp_to_file(tmp_file, filename);
 }
 
+static int freshen_loose_object(const unsigned char *sha1)
+{
+   return check_and_freshen(sha1, 1);
+}
+
+static int freshen_packed_object(const unsigned char *sha1)
+{
+   struct pack_entry e;
+   return find_pack_entry(sha1, e)  freshen_file(e.p-pack_name);
+}
+
 int write_sha1_file(const void *buf, unsigned long len, const char *type, 
unsigned char *returnsha1)
 {
unsigned char sha1[20];
@@ -2961,7 +2998,7 @@ int write_sha1_file(const void *buf, unsigned long len, 
const char *type, unsign
write_sha1_file_prepare(buf, len, type, sha1, hdr, hdrlen);
if (returnsha1)
hashcpy(returnsha1, sha1);
-   if (has_sha1_file(sha1))
+   if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
return 0;
return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }
diff --git a/t/t6501-freshen-objects.sh b/t/t6501-freshen-objects.sh
index e25c47d..157f3f9 100755
--- a/t/t6501-freshen-objects.sh
+++ b/t/t6501-freshen-objects.sh
@@ -100,6 +100,33 @@ for repack in '' true; do
test_expect_success repository passes fsck ($title) '
git fsck
'
+
+   test_expect_success abandon objects again ($title) '
+   git reset --hard HEAD^ 
+   find .git/objects -type f |
+   xargs test-chmtime -v -86400
+   '
+
+   test_expect_success start writing new commit with same tree ($title) '
+   tree=$(
+   GIT_INDEX_FILE=index.tmp 
+   export GIT_INDEX_FILE 
+   git read-tree HEAD 
+   add abandon 
+   add unrelated 
+   git write-tree
+   )
+

Re: [PATCH 16/16] write_sha1_file: freshen existing objects

2014-10-03 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 When we try to write a loose object file, we first check
 whether that object already exists. If so, we skip the
 write as an optimization. However, this can interfere with
 prune's strategy of using mtimes to mark files in progress.

 For example, if a branch contains a particular tree object
 and is deleted, that tree object may become unreachable, and
 have an old mtime. If a new operation then tries to write
 the same tree, this ends up as a noop; we notice we
 already have the object and do nothing. A prune running
 simultaneously with this operation will see the object as
 old, and may delete it.

 We can solve this by freshening objects that we avoid
 writing by updating their mtime. The algorithm for doing so
 is essentially the same as that of has_sha1_file. Therefore
 we provide a new (static) interface check_and_freshen,
 which finds and optionally freshens the object. It's trivial
 to implement freshening and simple checking by tweaking a
 single parameter.

An old referent by a recent unreachable may be in pack.  Is it
expected that the same pack will have many similar old objects (in
other words, is it worth trying to optimize check-and-freshen by
bypassing access() and utime(), perhaps by keeping a freshened in
this process already flag in struct packed_git)?

Could check-and-freshen-nonlocal() ever be called with freshen set
to true?  Should it be?  In other words, should we be mucking with
objects in other people's repositories with utime()?

Other than these two minor questions, I am happy with this patch.

Thanks.

 Signed-off-by: Jeff King p...@peff.net
 ---
  sha1_file.c| 51 
 +++---
  t/t6501-freshen-objects.sh | 27 
  2 files changed, 71 insertions(+), 7 deletions(-)

 diff --git a/sha1_file.c b/sha1_file.c
 index d017289..d263b38 100644
 --- a/sha1_file.c
 +++ b/sha1_file.c
 @@ -442,27 +442,53 @@ void prepare_alt_odb(void)
   read_info_alternates(get_object_directory(), 0);
  }
  
 -static int has_loose_object_local(const unsigned char *sha1)
 +static int freshen_file(const char *fn)
  {
 - return !access(sha1_file_name(sha1), F_OK);
 + struct utimbuf t;
 + t.actime = t.modtime = time(NULL);
 + return utime(fn, t);
  }
  
 -int has_loose_object_nonlocal(const unsigned char *sha1)
 +static int check_and_freshen_file(const char *fn, int freshen)
 +{
 + if (access(fn, F_OK))
 + return 0;
 + if (freshen  freshen_file(fn))
 + return 0;
 + return 1;
 +}
 +
 +static int check_and_freshen_local(const unsigned char *sha1, int freshen)
 +{
 + return check_and_freshen_file(sha1_file_name(sha1), freshen);
 +}
 +
 +static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
  {
   struct alternate_object_database *alt;
   prepare_alt_odb();
   for (alt = alt_odb_list; alt; alt = alt-next) {
   fill_sha1_path(alt-name, sha1);
 - if (!access(alt-base, F_OK))
 + if (check_and_freshen_file(alt-base, freshen))
   return 1;
   }
   return 0;
  }
  
 +static int check_and_freshen(const unsigned char *sha1, int freshen)
 +{
 + return check_and_freshen_local(sha1, freshen) ||
 +check_and_freshen_nonlocal(sha1, freshen);
 +}
 +
 +int has_loose_object_nonlocal(const unsigned char *sha1)
 +{
 + return check_and_freshen_nonlocal(sha1, 0);
 +}
 +
  static int has_loose_object(const unsigned char *sha1)
  {
 - return has_loose_object_local(sha1) ||
 -has_loose_object_nonlocal(sha1);
 + return check_and_freshen(sha1, 0);
  }
  
  static unsigned int pack_used_ctr;
 @@ -2949,6 +2975,17 @@ static int write_loose_object(const unsigned char 
 *sha1, char *hdr, int hdrlen,
   return move_temp_to_file(tmp_file, filename);
  }
  
 +static int freshen_loose_object(const unsigned char *sha1)
 +{
 + return check_and_freshen(sha1, 1);
 +}
 +
 +static int freshen_packed_object(const unsigned char *sha1)
 +{
 + struct pack_entry e;
 + return find_pack_entry(sha1, e)  freshen_file(e.p-pack_name);
 +}
 +
  int write_sha1_file(const void *buf, unsigned long len, const char *type, 
 unsigned char *returnsha1)
  {
   unsigned char sha1[20];
 @@ -2961,7 +2998,7 @@ int write_sha1_file(const void *buf, unsigned long len, 
 const char *type, unsign
   write_sha1_file_prepare(buf, len, type, sha1, hdr, hdrlen);
   if (returnsha1)
   hashcpy(returnsha1, sha1);
 - if (has_sha1_file(sha1))
 + if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
   return 0;
   return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
  }
 diff --git a/t/t6501-freshen-objects.sh b/t/t6501-freshen-objects.sh
 index e25c47d..157f3f9 100755
 --- a/t/t6501-freshen-objects.sh
 +++ b/t/t6501-freshen-objects.sh
 @@ -100,6 +100,33 @@ for repack in '' true; do
   test_expect_success 

[PATCH v22.5 09/24] refs.c: pass a list of names to skip to is_refname_available

2014-10-03 Thread Jonathan Nieder
From: Ronnie Sahlberg sahlb...@google.com
Date: Thu, 1 May 2014 11:16:07 -0700

commit 23acf975d74825789112a3a7ba97dbbdc76904f4 upstream.

Change is_refname_available to take a list of strings to exclude when
checking for conflicts instead of just one single name. We can already
exclude a single name for the sake of renames. This generalizes that support.

ref_transaction_commit already tracks a set of refs that are being deleted
in an array.  This array is then used to exclude refs from being written to
the packed-refs file.  At some stage we will want to change this array to a
struct string_list and then we can pass it to is_refname_available via the
call to lock_ref_sha1_basic.  That will allow us to perform transactions
that perform multiple renames as long as there are no conflicts within the
starting or ending state.

For example, that would allow a single transaction that contains two
renames that are both individually conflicting:

   m - n/n
   n - m/m

No functional change intended yet.

Change-Id: I8c68f0a47eef25759d572436ba683cb7b1ccb7eb
Signed-off-by: Ronnie Sahlberg sahlb...@google.com
Signed-off-by: Jonathan Nieder jrnie...@gmail.com
---
Junio C Hamano wrote:
 Jonathan Nieder jrnie...@gmail.com writes:

 Thanks for the heads up.  (I hadn't realized the ref-transaction-1 was
 part of 'master' --- yay!)  Rebased and reworked:
 https://code-review.googlesource.com/1027

 Heh, I was sort of expecting that heavy-weight contributors that
 throw many and/or large patch series would at least be paying
 attention to What's cooking reports.

That's fair.  I don't pay enough attention to the 'graduated' section.

 Will take a look over there (I really hate having to context switch
 only to review this series, though).

Here's a copy to save a trip.

 refs.c | 50 --
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/refs.c b/refs.c
index 83b2c77..05e42a7 100644
--- a/refs.c
+++ b/refs.c
@@ -785,13 +785,13 @@ static void prime_ref_dir(struct ref_dir *dir)
}
 }
 
-static int entry_matches(struct ref_entry *entry, const char *refname)
+static int entry_matches(struct ref_entry *entry, const struct string_list 
*list)
 {
-   return refname  !strcmp(entry-name, refname);
+   return list  string_list_has_string(list, entry-name);
 }
 
 struct nonmatching_ref_data {
-   const char *skip;
+   const struct string_list *skip;
struct ref_entry *found;
 };
 
@@ -815,16 +815,19 @@ static void report_refname_conflict(struct ref_entry 
*entry,
 /*
  * Return true iff a reference named refname could be created without
  * conflicting with the name of an existing reference in dir.  If
- * oldrefname is non-NULL, ignore potential conflicts with oldrefname
- * (e.g., because oldrefname is scheduled for deletion in the same
+ * skip is non-NULL, ignore potential conflicts with refs in skip
+ * (e.g., because they are scheduled for deletion in the same
  * operation).
  *
  * Two reference names conflict if one of them exactly matches the
  * leading components of the other; e.g., foo/bar conflicts with
  * both foo and with foo/bar/baz but not with foo/bar or
  * foo/barbados.
+ *
+ * skip must be sorted.
  */
-static int is_refname_available(const char *refname, const char *oldrefname,
+static int is_refname_available(const char *refname,
+   const struct string_list *skip,
struct ref_dir *dir)
 {
const char *slash;
@@ -838,12 +841,12 @@ static int is_refname_available(const char *refname, 
const char *oldrefname,
 * looking for a conflict with a leaf entry.
 *
 * If we find one, we still must make sure it is
-* not oldrefname.
+* not in skip.
 */
pos = search_ref_dir(dir, refname, slash - refname);
if (pos = 0) {
struct ref_entry *entry = dir-entries[pos];
-   if (entry_matches(entry, oldrefname))
+   if (entry_matches(entry, skip))
return 1;
report_refname_conflict(entry, refname);
return 0;
@@ -876,13 +879,13 @@ static int is_refname_available(const char *refname, 
const char *oldrefname,
/*
 * We found a directory named refname. It is a
 * problem iff it contains any ref that is not
-* oldrefname.
+* in skip.
 */
struct ref_entry *entry = dir-entries[pos];
struct ref_dir *dir = get_ref_dir(entry);
struct nonmatching_ref_data data;
 
-   data.skip = oldrefname;
+   data.skip = skip;
sort_ref_dir(dir);
if (!do_for_each_entry_in_dir(dir, 0, nonmatching_ref_fn, 
data))
return 

Re: [PATCH 13/16] prune: keep objects reachable from recent objects

2014-10-03 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 Instead, this patch pushes the extra work onto prune, which
 runs less frequently (and has to look at the whole object
 graph anyway). It creates a new category of objects: objects
 which are not recent, but which are reachable from a recent
 object. We do not prune these objects, just like the
 reachable and recent ones.

 This lets us avoid the recursive check above, because if we
 have an object, even if it is unreachable, we should have
 its referent:

   - if we are creating new objects, then we cannot create
 the parent object without having the child

   - and if we are pruning objects, will not prune the child
 if we are keeping the parent

Sorry but this part is beyond a simple panda brain.

I can understand this

If we have an object, even if it is unreachable, we
should have its referent.

as a description of the desired behaviour.  If we have a tree that
is unreachable, we must make sure that we do not discard blobs that
are reachable from that tree, or we would end up corrupting our
repository if we ever allow that tree to become reachable from our
refs later.

But how does that connect to these two bullet points?

   - if we are creating new objects, then we cannot create
 the parent object without having the child

We cannot create the parent (e.g. tree) without having the child
(e.g. blob that is referred to by the tree we are creating).
So this bullet point is repeating the same thing?

   - and if we are pruning objects, will not prune the child
 if we are keeping the parent

We will not prune blob that are reachable from a tree that we
are not yet ready to prune.  So this again is repeating the same
thing?

But these are this is how we want our system to behave.  And if we
assume our system behaves like so, then prune would be safe.

But it is unclear how that behaviour is realized.  Puzzled...

... goes and thinks ...

With this patch applied, the system will not prune unreachable old
objects that are reachable from a recent object (the recent object
itself may or may not be reachable but that does not make any
difference).  And that is sufficient to ensure the integrity of the
repository even if you allow new objects to be created reusing any
of these unreachable objects that are left behind by prune, because
the reachability check done during prune (with this patch applied)
makes sure any object left in the repository can safely be used as a
starting point of connectivity traversal.

Ok, I think I got it now, but then do we still need to utime(2) the
loose object files for unreachable objects that are referenced by
a recent object (which is done in a later patch), or is that purely
an optimization for the next round of gc where you would have more
recent objects (i.e. you do not have to traverse to find out an old
one is reachable from a new one, as there will be fewer old ones)?




--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/16] t5304: use test_path_is_* instead of test -f

2014-10-03 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 This is slightly more robust (checking ! test -f would not
 notice a directory of the same name, though that is not
 likely to happen here). It also makes debugging easier, as
 the test script will output a message on failure.

 Signed-off-by: Jeff King p...@peff.net
 ---
 This patch is totally optional. I did it while debugging t5304 (strange
 how badly prune works when you accidentally invert the mtime check!)
 and figured it might be worth keeping as a cleanup.

Looks sensible; thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/16] t5304: use helper to report failure of test foo = bar

2014-10-03 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 For small outputs, we sometimes use:

   test $(some_cmd) = something we expect

 instead of a full test_cmp. The downside of this is that
 when it fails, there is no output at all from the script.
 Let's introduce a small helper to make tests easier to
 debug.

 Signed-off-by: Jeff King p...@peff.net
 ---
 This is in the same boat as the last commit; we can drop it without
 hurting the rest of the series.

 Is test_eq too cutesy or obfuscated? 

Not really.  As long as its two strings are not to long, the
output may still be readable.

 I have often wanted it when
 debugging other tests, too. Our usual technique is to do:

   echo whatever expect 
   do_something actual 
   test_cmp expect actual

 That's a bit verbose. We could hide it behind something like test_eq,
 too, but it introduces several extra new processes.

What do you mean by extra new processes?  Whether open coded in a
verbose way, or wrapped inside a helper, e.g.

test_eql () {
echo $1 expect 
shift 
$@ actual 
test_cmp expect actual
}
...
test_eql whatever do_something

the number of processes would be the same, no?

Or do you mean test_cmp is an extra process compared with

test_eq whatever $(do_something)

Hopefully, do_something does something more than what takes test_cmp
to run, so I wouldn't be worried too much about it.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/16] prune: factor out loose-object directory traversal

2014-10-03 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 Prune has to walk $GIT_DIR/objects/?? in order to find the
 set of loose objects to prune. Other parts of the code
 (e.g., count-objects) want to do the same. Let's factor it
 out into a reusable for_each-style function.

Doesn't fsck also look at these as well?  I recall that we also have
a sort by inum trick employed there.  Would it also be applicable
to these two callers?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16] make prune mtime-checking more careful

2014-10-03 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 ... The objects are removed by prune, which doesn't realize
 that they are part of an ongoing operation. Prune uses the filesystem
 mtime to determine this, but we are not very thorough in making sure
 that is kept up to date.

The whole series looked quite sensible.  Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bundle: plug minor memory leak in is_tag_in_date_range()

2014-10-03 Thread René Scharfe
Free the buffer returned by read_sha1_file() even if no valid tagger
line is found.

Signed-off-by: Rene Scharfe l@web.de
---
 bundle.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/bundle.c b/bundle.c
index b2b89fe..9ed865c 100644
--- a/bundle.c
+++ b/bundle.c
@@ -211,24 +211,28 @@ static int is_tag_in_date_range(struct object *tag, 
struct rev_info *revs)
enum object_type type;
char *buf, *line, *lineend;
unsigned long date;
+   int result = 1;
 
if (revs-max_age == -1  revs-min_age == -1)
-   return 1;
+   goto out;
 
buf = read_sha1_file(tag-sha1, type, size);
if (!buf)
-   return 1;
+   goto out;
line = memmem(buf, size, \ntagger , 8);
if (!line++)
-   return 1;
+   goto out_free;
lineend = memchr(line, '\n', buf + size - line);
line = memchr(line, '', lineend ? lineend - line : buf + size - line);
if (!line++)
-   return 1;
+   goto out_free;
date = strtoul(line, NULL, 10);
-   free(buf);
-   return (revs-max_age == -1 || revs-max_age  date) 
+   result = (revs-max_age == -1 || revs-max_age  date) 
(revs-min_age == -1 || revs-min_age  date);
+out_free:
+   free(buf);
+out:
+   return result;
 }
 
 int create_bundle(struct bundle_header *header, const char *path,
-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/16] foreach_alt_odb: propagate return value from callback

2014-10-03 Thread René Scharfe

Am 03.10.2014 um 22:21 schrieb Jeff King:

We check the return value of the callback and stop iterating
if it is non-zero. However, we do not make the non-zero
return value available to the caller, so they have no way of
knowing whether the operation succeeded or not (technically
they can keep their own error flag in the callback data, but
that is unlike our other for_each functions).

Signed-off-by: Jeff King p...@peff.net
---
  cache.h |  2 +-
  sha1_file.c | 12 
  2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 8206039..cd16e25 100644
--- a/cache.h
+++ b/cache.h
@@ -1143,7 +1143,7 @@ extern void prepare_alt_odb(void);
  extern void read_info_alternates(const char * relative_base, int depth);
  extern void add_to_alternates_file(const char *reference);
  typedef int alt_odb_fn(struct alternate_object_database *, void *);
-extern void foreach_alt_odb(alt_odb_fn, void*);
+extern int foreach_alt_odb(alt_odb_fn, void*);

  struct pack_window {
struct pack_window *next;
diff --git a/sha1_file.c b/sha1_file.c
index c08c0cb..bae1c15 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -412,14 +412,18 @@ void add_to_alternates_file(const char *reference)
link_alt_odb_entries(alt, strlen(alt), '\n', NULL, 0);
  }

-void foreach_alt_odb(alt_odb_fn fn, void *cb)
+int foreach_alt_odb(alt_odb_fn fn, void *cb)
  {
struct alternate_object_database *ent;
+   int r = 0;

prepare_alt_odb();
-   for (ent = alt_odb_list; ent; ent = ent-next)
-   if (fn(ent, cb))
-   return;
+   for (ent = alt_odb_list; ent; ent = ent-next) {
+   int r = fn(ent, cb);
+   if (r)
+   break;
+   }
+   return r;


This will always return zero.  You probably shadowed r unintentionally 
inside the loop, right?



  }

  void prepare_alt_odb(void)



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bundle: plug minor memory leak in is_tag_in_date_range()

2014-10-03 Thread Jonathan Nieder
René Scharfe wrote:

 --- a/bundle.c
 +++ b/bundle.c
 @@ -211,24 +211,28 @@ static int is_tag_in_date_range(struct object *tag, 
 struct rev_info *revs)
   enum object_type type;
   char *buf, *line, *lineend;

If buf is initialized to NULL, there is no need for separate out and
out_free labels.

With or without such a change,
Reviewed-by: Jonathan Nieder jrnie...@gmail.com

diff --git i/bundle.c w/bundle.c
index 4158e11..101cde0 100644
--- i/bundle.c
+++ w/bundle.c
@@ -209,7 +209,7 @@ static int is_tag_in_date_range(struct object *tag, struct 
rev_info *revs)
 {
unsigned long size;
enum object_type type;
-   char *buf, *line, *lineend;
+   char *buf = NULL, *line, *lineend;
unsigned long date;
int result = 1;
 
@@ -221,17 +221,16 @@ static int is_tag_in_date_range(struct object *tag, 
struct rev_info *revs)
goto out;
line = memmem(buf, size, \ntagger , 8);
if (!line++)
-   goto out_free;
+   goto out;
lineend = memchr(line, '\n', buf + size - line);
line = memchr(line, '', lineend ? lineend - line : buf + size - line);
if (!line++)
-   goto out_free;
+   goto out;
date = strtoul(line, NULL, 10);
result = (revs-max_age == -1 || revs-max_age  date) 
(revs-min_age == -1 || revs-min_age  date);
-out_free:
-   free(buf);
 out:
+   free(buf);
return result;
 }
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] write_sha1_file: freshen existing objects

2014-10-03 Thread Jeff King
On Fri, Oct 03, 2014 at 02:29:58PM -0700, Junio C Hamano wrote:

  We can solve this by freshening objects that we avoid
  writing by updating their mtime. The algorithm for doing so
  is essentially the same as that of has_sha1_file. Therefore
  we provide a new (static) interface check_and_freshen,
  which finds and optionally freshens the object. It's trivial
  to implement freshening and simple checking by tweaking a
  single parameter.
 
 An old referent by a recent unreachable may be in pack.  Is it
 expected that the same pack will have many similar old objects (in
 other words, is it worth trying to optimize check-and-freshen by
 bypassing access() and utime(), perhaps by keeping a freshened in
 this process already flag in struct packed_git)?

Thanks for reminding me. I considered something like that early on and
then completely forgot to revisit it. I do not have numbers either way
on whether it is an optimization worth doing. On the one hand, it is
very easy to do.  On the other, it probably does not make a big
difference; we are literally skipping the write of an entire object, and
have just run a complete sha1 over the contents. A single utime() call
probably is not a big deal.

 Could check-and-freshen-nonlocal() ever be called with freshen set
 to true?  Should it be?  In other words, should we be mucking with
 objects in other people's repositories with utime()?

Yes, it can, and I think the answer to should is yes for safety,
though I agree it feels a little hacky. I did explicitly write it so
that we fail-safe when freshening doesn't work. That is, if we try to
freshen an object that is in an alternate and we cannot (e.g., because
we don't have write access), we'll fallback to writing out a new loose
object locally.

That's very much the safest thing to do, but obviously it performs less
well. Again, this is the code path where we _would have_ written out the
object anyway, so it might not be that bad. But I don't know to what
degree the current code relies on that optimization for reasonable
performance. E.g., if you clone from a read-only alternate and then try
to `git write-tree` immediately on the index, will we literally make a
full copy of each tree object?

Hmm, that should be easy to test...

  $ su - nobody
  $ git clone -s ~peff/compile/linux /tmp/foo
  $ cd /tmp/foo

  $ git count-objects
  0 objects, 0 kilobytes
  $ git write-tree
  $ git count-objects
  0 objects, 0 kilobytes

So far so good. Let's blow away the cache-tree to make sure...

  $ rm .git/index
  $ git read-tree HEAD
  $ git write-tree
  $ git count-objects
  0 objects, 0 kilobytes

So that's promising. But it's far from a proof that there isn't some
other code path that will be negatively impacted.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/16] prune: keep objects reachable from recent objects

2014-10-03 Thread Jeff King
On Fri, Oct 03, 2014 at 02:47:57PM -0700, Junio C Hamano wrote:

 Sorry but this part is beyond a simple panda brain.

I probably didn't explain it very well. I found it rather confusing to
reason about. Let's see if we can go through it together.

 I can understand this
 
   If we have an object, even if it is unreachable, we
   should have its referent.
 
 as a description of the desired behaviour.  If we have a tree that
 is unreachable, we must make sure that we do not discard blobs that
 are reachable from that tree, or we would end up corrupting our
 repository if we ever allow that tree to become reachable from our
 refs later.

Yes, exactly.

- if we are creating new objects, then we cannot create
  the parent object without having the child
 
 We cannot create the parent (e.g. tree) without having the child
 (e.g. blob that is referred to by the tree we are creating).
 So this bullet point is repeating the same thing?

Sort of. The first statement if we have an object, we should have its
referent is a desired state. The bullet points are trying to reason
about the _changes_ in state, and argue that there are no state changes
that take us from a valid state to an invalid one.

There are two actions that affect the state of the object database:
adding objects and removing objects.

We cannot go from a valid state to an invalid state by adding objects,
because we cannot create the parent without having the child. That is
already the case before this patch (though as noted, you can cheat
if you know the sha1 from another repository and bypass git's safety
checks, but I do not think that is a case worth worrying about).

- and if we are pruning objects, will not prune the child
  if we are keeping the parent
 
 We will not prune blob that are reachable from a tree that we
 are not yet ready to prune.  So this again is repeating the same
 thing?

This is the other action. When removing objects, we will keep the child
along with the parent, and therefore we cannot move to an invalid state.
That's the part that this patch does.

 With this patch applied, the system will not prune unreachable old
 objects that are reachable from a recent object (the recent object
 itself may or may not be reachable but that does not make any
 difference).  And that is sufficient to ensure the integrity of the
 repository even if you allow new objects to be created reusing any
 of these unreachable objects that are left behind by prune, because
 the reachability check done during prune (with this patch applied)
 makes sure any object left in the repository can safely be used as a
 starting point of connectivity traversal.

Yes, exactly.

 Ok, I think I got it now, but then do we still need to utime(2) the
 loose object files for unreachable objects that are referenced by
 a recent object (which is done in a later patch), or is that purely
 an optimization for the next round of gc where you would have more
 recent objects (i.e. you do not have to traverse to find out an old
 one is reachable from a new one, as there will be fewer old ones)?

No, we don't need to utime() the loose objects. As long as there is a
recently-written object that refers to them, they are considered worth
keeping.

The later patch that calls utime() on objects is really about defeating
the write_sha1_file optimization. That is, you might _think_ you have
written a recent object that refers to other objects, but the sha1-file
code silently turned it into a noop.

Does that make more sense?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/16] t5304: use helper to report failure of test foo = bar

2014-10-03 Thread Jeff King
On Fri, Oct 03, 2014 at 03:17:18PM -0700, Junio C Hamano wrote:

  That's a bit verbose. We could hide it behind something like test_eq,
  too, but it introduces several extra new processes.
 
 What do you mean by extra new processes?  Whether open coded in a
 verbose way, or wrapped inside a helper, e.g.
 
   test_eql () {
   echo $1 expect 
 shift 
 $@ actual 
 test_cmp expect actual
   }
 ...
   test_eql whatever do_something
 
 the number of processes would be the same, no?
 
 Or do you mean test_cmp is an extra process compared with
 
   test_eq whatever $(do_something)

Sorry, yeah, I meant new processes versus test $foo = $bar.

 Hopefully, do_something does something more than what takes test_cmp
 to run, so I wouldn't be worried too much about it.

Yeah, I may just be overly paranoid here. If we are not worried about a
few extra processes, then the test_eql you showed above may be
preferable, because its output is uniform with other test_cmp tests
(although maybe it also introduces problems, because it does not handle
stray whitespace in the same way, and it puts extra files in the working
tree).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/16] prune: factor out loose-object directory traversal

2014-10-03 Thread Jeff King
On Fri, Oct 03, 2014 at 03:19:29PM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  Prune has to walk $GIT_DIR/objects/?? in order to find the
  set of loose objects to prune. Other parts of the code
  (e.g., count-objects) want to do the same. Let's factor it
  out into a reusable for_each-style function.
 
 Doesn't fsck also look at these as well?  I recall that we also have
 a sort by inum trick employed there.  Would it also be applicable
 to these two callers?

I didn't think to check fsck. It does have a similar function and could
be converted.

Fsck does store the inode and later visits the files in inode order. I
don't know if it matters as much here, because we are not opening most
of the files, just stat()ing them.

I'd also be kind of surprised if this optimization really matters in
practice these days. The inode-sorting comes from 7e8c174 (fsck-cache:
sort entries by inode number, 2005-05-02). That predates packfiles by
several months, I believe, and you are not likely to have a very large
number of loose objects anymore (and if you do, you will get _way_
better performance simply by packing).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/16] prune: keep objects reachable from recent objects

2014-10-03 Thread Jeff King
On Fri, Oct 03, 2014 at 02:47:57PM -0700, Junio C Hamano wrote:

 With this patch applied, the system will not prune unreachable old
 objects that are reachable from a recent object (the recent object
 itself may or may not be reachable but that does not make any
 difference).  And that is sufficient to ensure the integrity of the
 repository even if you allow new objects to be created reusing any
 of these unreachable objects that are left behind by prune, because
 the reachability check done during prune (with this patch applied)
 makes sure any object left in the repository can safely be used as a
 starting point of connectivity traversal.

Your use of safely in the last sentence here made me think.

In a repository that has had this patch from the beginning, it should be
safe to traverse the unreachable but unpruned objects, because the
property of the repository we are trying to guarantee means that we will
have all of the referents.

But in a repository created by current git (or one where you have been
fooling around with hash-object), we might not. Our secondary traversal
for recent objects may fail because we are already missing some of the
referents, and prepare_revision_walk (or traverse_commit_list) will
barf. That's a bad thing.

I think we need to tell that second traversal that it's OK to skip
missing objects that are reachable from recent objects. It's not ideal,
but there's nothing better we can do.  Eventually those objects will
fall off the back of the expiration window, but we should not be holding
up prunes (which, after all, are the thing that expires them :) ) in the
meantime.

I think it would be enough to just set revs-ignore_missing_links for
the secondary traversal.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/16] foreach_alt_odb: propagate return value from callback

2014-10-03 Thread Jeff King
On Sat, Oct 04, 2014 at 12:55:13AM +0200, René Scharfe wrote:

 -void foreach_alt_odb(alt_odb_fn fn, void *cb)
 +int foreach_alt_odb(alt_odb_fn fn, void *cb)
   {
  struct alternate_object_database *ent;
 +int r = 0;
 
  prepare_alt_odb();
 -for (ent = alt_odb_list; ent; ent = ent-next)
 -if (fn(ent, cb))
 -return;
 +for (ent = alt_odb_list; ent; ent = ent-next) {
 +int r = fn(ent, cb);
 +if (r)
 +break;
 +}
 +return r;
 
 This will always return zero.  You probably shadowed r unintentionally
 inside the loop, right?

Eek, yes. Thanks for catching. I'll fix it in the re-roll.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/16] prune: keep objects reachable from recent objects

2014-10-03 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Fri, Oct 03, 2014 at 02:47:57PM -0700, Junio C Hamano wrote:

 With this patch applied, the system will not prune unreachable old
 objects that are reachable from a recent object (the recent object
 itself may or may not be reachable but that does not make any
 difference).  And that is sufficient to ensure the integrity of the
 repository even if you allow new objects to be created reusing any
 of these unreachable objects that are left behind by prune, because
 the reachability check done during prune (with this patch applied)
 makes sure any object left in the repository can safely be used as a
 starting point of connectivity traversal.

 Your use of safely in the last sentence here made me think.

 In a repository that has had this patch from the beginning, it should be
 safe to traverse the unreachable but unpruned objects, because the
 property of the repository we are trying to guarantee means that we will
 have all of the referents.
 ...

Another case that may be of interest is to start a dumb HTTP commit
walker to fetch new history and kill it before it manages to
complete the graph and update the refs.  It has the same property as
running hash-objects to create an unconnected cruft object after you
got your repository into a safe state, but it would be a lot less
easier to blame the user for the resulting breakage.

Perhaps the dumb commit walkers outlived their usefulness and we
should start talking about their deprecation and eventual removal at
a later version of Git?


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html