from:"Michael"

I hope a I have a "leap forward"

On 7/30/2018 11:39 AM, Ævar Arnfjörð Bjarmason wrote:
> Perhaps it's worth taking a step back here and thinking about whether
> this whole thing is unworkable. It was hard enough to get this to work
> on the combination of Linux, *BSD and Solaris, but I suspect we'll run
> into increasingly obscure platforms where this is hard or impossible
> (AIX, HP/UX etc.)
While I still cannot say for HP/UX it does seem there is a potential
solution based on the status for _LITTLE_ENDIAN and _BIG_ENDIAN. At
least, gcc on POWER and xlc on POWER provides one or the other - and my
hope is that gcc on other platforms also provides them.

For "other" compilers that do not provide them - a modification to
CFLAGS to define one or the other should make "make" work.

Details (note - I am not a programmer, so by definition at least one of
my "macros" will be wrong :)

AIX and xlc
root@x066:[/]xlc   -qshowmacros -E /dev/null | grep -i endi
1506-297 (S) Unable to open input file null. A file or directory in the
path name does not exist..
#define __HHW_BIG_ENDIAN__ 1
#define __BIG_ENDIAN__ 1
#define __THW_BIG_ENDIAN__ 1
#define _BIG_ENDIAN 1

On SLES12 (le) and xlc
suse12test:~/images/littleEndian/sles # xlc -qshowmacros -dM -E x.c |
grep -i endi
#define _LITTLE_ENDIAN 1
#define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
#define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__
#define __LITTLE_ENDIAN__ 1
#define __ORDER_BIG_ENDIAN__ 4321
#define __ORDER_LITTLE_ENDIAN__ 1234
#define __ORDER_PDP_ENDIAN__ 3412
#define __VEC_ELEMENT_REG_ORDER__ __ORDER_LITTLE_ENDIAN__

Based on what I can see on gcc on POWER and xlc on POWER I think an
approach (simplified) can be:

#if undefined(_BIG_ENDIAN) && undef(_LITTLE_ENDIAN)
#error "one of _BIG_ENDIAN or _LITTLE_ENDIAN must be defined. Try adding
the correct value to CFLAGS"
#else defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN)
#error "Only one of _BIG_ENDIAN and _LITTLE_ENDIAN may be defined, not both"
#endif

And then logic based on the value set.
This should also make cross-compile possible by unsetting an incorrect
default and setting the correct value.

p.s. Is there a setting I need to set somewhere so I receive a copy of
the email sent after it is received by the list. I could send myself a
copy, but I much prefer it comes from the maillist - as verification it
was received.

Re: Is detecting endianness at compile-time unworkable?



On 7/30/2018 11:39 AM, Ævar Arnfjörð Bjarmason wrote:
> The reason we're in this hole is because we use this
> sha1collisiondetection library to do SHA-1, and the reason we have
> issues with it specifically (not OpenSSL et al) is because its only
> method of detecting endianness is at compile time.
When using gcc (no xlc available for Linux on Power)

POWER6 (Big Endian by definition)
root@x068:[/data/httpd/gcc]gcc -dM -E - < /dev/null | grep -i end
#define __ORDER_LITTLE_ENDIAN__ 1234
#define __BIG_ENDIAN__ 1
#define __FLOAT_WORD_ORDER__ __ORDER_BIG_ENDIAN__
#define __ORDER_PDP_ENDIAN__ 3412
#define _BIG_ENDIAN 1
#define __ORDER_BIG_ENDIAN__ 4321
#define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__

SLES12 on POWER8
suse12test:~ # gcc -dM -E - < /dev/null | grep -i end
#define __ORDER_LITTLE_ENDIAN__ 1234
#define _LITTLE_ENDIAN 1
#define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__
#define __ORDER_PDP_ENDIAN__ 3412
#define __LITTLE_ENDIAN__ 1
#define __ORDER_BIG_ENDIAN__ 4321
#define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__

*So, for compile time tests, when gcc is the compiler it seems the
following defines are available**
**__BIG_ENDIAN__, _BIG_ENDIAN,  __LITTLE__ENDIAN__, _LITTLE_ENDIAN**
**or something based on the value of __BYTE_ORDER__*

I'll see if I can find something similar for xlc, but will only be able
to test xlc on AIX.

>
> This didn't use to be the case, it was changed in this commit:
> https://github.com/cr-marcstevens/sha1collisiondetection/commit/d597672
>
> Dan Shumow: Since the commit message doesn't say why, can you elaborate
> a bit on why this was done, i.e. is determining this at runtime harmful
> for performance? If not, perhaps it would be best to bring this back, at
> least as an option.

Re: Is detecting endianness at compile-time unworkable?


A small step back...


On 7/30/2018 11:39 AM, Ævar Arnfjörð Bjarmason wrote:

On Sun, Jul 29 2018, Michael wrote:


On 29/07/2018 22:06, brian m. carlson wrote:

On Sun, Jul 29, 2018 at 09:48:43PM +0200, Michael wrote:

On 29/07/2018 21:27, brian m. carlson wrote:

Well, that explains it.  I would recommend submitting a patch to
https://github.com/cr-marcstevens/sha1collisiondetection, and the we can
pull in the updated submodule with that fix.

Not sure I am smart enough to do that. I'll have to download, build, and see
what it says.

The issue is that somewhere in lib/sha1.c, you need to cause
SHA1DC_BIGENDIAN to be set.  That means you need to figure out what
compiler macro might indicate that.

I remember - roughly - a few decades back - having an assignment to
write code to determine endianness. PDP and VAC were different iirc,
and many other micro-processors besides the 8088/8086/z85/68k/etc..

If you are looking for a compiler macro as a way to determine this -
maybe you have one for gcc, but not for xlc. I do not know it - currently :)

I'm not familiar with AIX, but from searching around I found this
porting manual from IBM:
http://www.redbooks.ibm.com/redbooks/pdfs/sg246034.pdf
This is from July 2001 - when AIX 5L, for Linux affinity, was new. AIX 
was (nearly) the #1 posix system, and linux was a minor player in the 
data center (in or out (now as IAAS)). IMHO, the recommendations made in 
2001 are probably no longer applicable (64-bit was fairly new, e.g., 
rather than common).


There they suggest either defining your own macros, or testing the
memory layout at runtime (see section "2.2.2.3 Technique 3: Testing
memory layout" and surrounding sections).

Perhaps it's worth taking a step back here and thinking about whether
this whole thing is unworkable. It was hard enough to get this to work
on the combination of Linux, *BSD and Solaris, but I suspect we'll run
into increasingly obscure platforms where this is hard or impossible
(AIX, HP/UX etc.)

The reason we're in this hole is because we use this
sha1collisiondetection library to do SHA-1, and the reason we have
issues with it specifically (not OpenSSL et al) is because its only
method of detecting endianness is at compile time.
Cannot speak for the "others", but as I have mentioned before - as AIX 
in only on POWER it is also only Big Endian - so a compiletime #if 
testing for _AIX will work fine

This didn't use to be the case, it was changed in this commit:
https://github.com/cr-marcstevens/sha1collisiondetection/commit/d597672

Dan Shumow: Since the commit message doesn't say why, can you elaborate
a bit on why this was done, i.e. is determining this at runtime harmful
for performance? If not, perhaps it would be best to bring this back, at
least as an option.

And, as an aside, the reason we can't easily make it better ourselves is
because the build process for git.git doesn't have a facility to run
code to detect this type of stuff (the configure script is always
optional). So we can't just run this test ourselves.
On AIX - I am required to run configure, and frankly, I am amazed that 
not everyone is running it. Among other things I an modifying the prefix 
(to /opt) and many of the others to different /var/git/* areas as I do 
not want to "polute" the BOS (base OS) and/or other packages/packagers. 
Officially, according the Linux FHS-3.0 I sould be using /opt/aixtools 
as prefix.


FYI: my current build process is:
wget git-{version}.tar.xz
xz -dc git-{version}.tar.xz
cd git-{version}.tar.xz
mv Makefile Makefile.git #my build scripts only run configure when 
Makefile does not exist

./configure ...
ln Makefile.git Makefile
make

I am amazed, as it rarely happens (maybe git is my first encounter) - 
that configure does not create a Makefile. This also complicates 
building git "out of tree".


Junio: I've barked up that particular tree before in
https://public-inbox.org/git/87a7x3kmh5@evledraar.gmail.com/ and I
won't bore you all by repeating myself, except to say that this is yet
another case where I wish we had a hard dependency on some way of doing
checks via compiled code in our build system.
For AIX: again - the determination is simple. If _AIX is set to 1 then 
use BigEndian, or, use:

michael@x071:[/home/michael]uname
AIX
 i.e., something like:
$(uname) == "AIX" && BigEndian=1

Re: Is detecting endianness at compile-time unworkable?

I have just replied to 
https://github.com/cr-marcstevens/sha1collisiondetection/pull/42

I checked a gcc compiler on AIX, and I have the defines for vac.

I do not have access yet to SLES or RHEL (or Ubuntu), just a "free 
Debian" on my Power6.

* my conclusions|recommendations:

a) AIX is always Big Endian, the define _AIX can be used to determine if AIX

b) POWER7 and earlier are always Big Endian

c) assuming lscpu is always available on Linux systems a command (in 
configure?) could be used:

root@x074:/usr/bin# lscpu | grep -i endian
Byte Order:    Big Endian

d) some linux systems (in any case latest versions of RHEL and SLES 
enterprise) should have a file named lparcfg in /proc 
(/proc/{powerppc|ppc64|ppc64le|ppc64el}/lparcfg - and it might be in 
that file. Need to get onto a (POWER8|POWER9) system to check.

Hope this helps:

details re: define of _AIX

root@x068:[/data/httpd/gcc]gcc -dM -E - < /dev/null | grep AIX | head -1
#define _AIX 1

michael@x071:[/home/michael]/usr/bin/grep -p DEFLT: 
/etc/vac.cfg.[567][123] | grep options\
    options   = 
-D_AIX,-D_AIX32,-D_AIX41,-D_AIX43,-D_AIX50,-D_AIX51,-D_AIX52,-D_AIX53,-D_IBMR2,-D_POWER
    options   = 
-D_AIX,-D_AIX32,-D_AIX41,-D_AIX43,-D_AIX50,-D_AIX51,-D_AIX52,-D_AIX53,-D_AIX61,-D_IBMR2,-D_POWER
    options   = 
-D_AIX,-D_AIX32,-D_AIX41,-D_AIX43,-D_AIX50,-D_AIX51,-D_AIX52,-D_AIX53,-D_AIX61,-D_AIX71,-D_IBMR2,-D_POWER
    options   = 
-D_AIX,-D_AIX32,-D_AIX41,-D_AIX43,-D_AIX50,-D_AIX51,-D_AIX52,-D_AIX53,-D_AIX61,-D_AIX71,-D_AIX72,-D_IBMR2,-D_POWER

michael@x071:[/home/michael]ls /etc/vac.cfg.[567][123]
/etc/vac.cfg.53  /etc/vac.cfg.61  /etc/vac.cfg.71  /etc/vac.cfg.72

On 7/30/2018 8:39 PM, Daniel Shumow wrote:
The change was definitely made for performance. Removing the if 
statements, conditioned upon endianess was an approx 10% improvement, 
which was very important to getting this library accepted into git.

Thanks,
Dan

On Mon, Jul 30, 2018 at 11:32 AM, Junio C Hamano <mailto:gits...@pobox.com>> wrote:

Junio C Hamano mailto:gits...@pobox.com>> writes:

> Ævar Arnfjörð Bjarmason mailto:ava...@gmail.com>> writes:
>
>> And, as an aside, the reason we can't easily make it better
ourselves is
>> because the build process for git.git doesn't have a facility
to run
>> code to detect this type of stuff (the configure script is always
>> optional). So we can't just run this test ourselves.
>
> It won't help those who cross-compile anyway.  I thought we declared
> "we make a reasonable effort to guess the target endianness from the
> system header by inspecting usual macros, but will not aim to cover
> every system on the planet---instead there is a knob to tweak it for
> those on exotic platforms" last time we discussed this?

Well, having said all that, I do not think I personally mind if
./configure learned to include a "compile small program and run it
to determine byte order on the build machine" as part of "we make a
reasonable effort" as long as it cleanly excludes cross building
case (and the result is made overridable just in case we misdetect
the "cross-ness" of the build).

Re: git broken for AIX somewhere between 2.13.2 and 2.13.3

2018-07-30 Thread Michael


On 29/07/2018 23:40, Andreas Schwab wrote:

On Jul 29 2018, Ævar Arnfjörð Bjarmason  wrote:


Also, to you and anyone else with access to AIX: I'd be happy to figure
these issues out pro-actively if you give me a login to an AIX
machine. I promise not to do anything except compile/debug/test git on
it.

The GCC compile farm  has a machine
running AIX, and is free to use for anyone working on free software.
My goal is less "to work on", more, "to package" and/or "to work with". 
Most others, including IBM, seem to use gcc as compiler - which is fine. 
However, on AIX I often see side-effects introduced by the GNU run-time 
environment needed on top of the xlc.rte provided as part of AIX.


In any case - my testing is using the xlc complier - so there are syntax 
differences. the compiler has many modes - by default I use 'xlc_r' that 
has the following default settings:

xlc_r:  use    = DEFLT_C
    crt    = /lib/crt0.o
    mcrt   = /lib/mcrt0.o
    gcrt   = /lib/gcrt0.o
    libraries  = -L/usr/vac/lib,-lxlopt,-lxlipa,-lxl,-lpthreads,-lc
    proflibs   = -L/lib/profiled,-L/usr/lib/profiled
    hdlibs = -L/usr/vac/lib,-lhmd
    options    = 
-qlanglvl=extc99,-qcpluscmt,-qkeyword=inline,-qalias=ansi,-qthreaded,-D_THREAD_SAFE,-D__VACPP_MULTI__


DEFLT_C (for the curious, the default options is perhaps most 
interesting) is:

* common definitions

DEFLT_C:
    use   =DEFLT
    xlurt_cfg_path=/usr/vac/urt
    xlurt_cfg_name=urt_client.cfg

DEFLT_CPP:
    use   =DEFLT
    xlurt_cfg_path=/usr/vacpp/urt
    xlurt_cfg_name=urt_client.cfg

DEFLT:
    cppcomp   = /usr/vacpp/exe/xlCentry
    ccomp = /usr/vac/exe/xlcentry
    code  = /usr/vac/exe/xlCcode
    cpp   = /usr/vac/exe/xlCcpp
    munch = /usr/vacpp/exe/munch
    dis   = /usr/vac/exe/dis
    xlC   = /usr/vac/bin/xlc
    list  = /usr/vac/exe/xllist
    xslt  = /usr/vac/exe/XALAN
    transforms = /usr/vac/listings
    listlibs  = /usr/vac/lib
    cppinc    = /usr/vacpp/include
    ipa   = /usr/vac/exe/ipa
    cppfilt   = /usr/vacpp/bin/c++filt
    bolt  = /usr/vac/exe/bolt
    as    = /bin/as
    ld    = /bin/ld
    artool    = /bin/ar
    options   = 
-D_AIX,-D_AIX32,-D_AIX41,-D_AIX43,-D_AIX50,-D_AIX51,-D_AIX52,-D_AIX53,-D_AIX61,-D_IBMR2,-D_POWER

    options32 = -bpT:0x1000,-bpD:0x2000
    options32_bmaxdata = -bpT:0x1000,-bpD:0x3000
    options64 = -bpT:0x1,-bpD:0x11000
    ldopt = "b:o:e:u:R:H:Y:Z:L:T:A:k:j:"
    hdlibs    = -L/usr/vac/lib,-lhmd
    xlCcopt   = 
-qlanglvl=extc99,-qcpluscmt,-qkeyword=inline,-qalias=ansi

    crt_64    = /lib/crt0_64.o
    mcrt_64   = /lib/mcrt0_64.o
    gcrt_64   = /lib/gcrt0_64.o
    smplibraries = -lxlsmp
    palibraries  = -L/usr/vatools/lib,-lpahooks
    resexp = /usr/vacpp/lib/res.exp
    genexports = /usr/vac/bin/CreateExportList
    vac_path   = /usr/vac
    vacpp_path = /usr/vacpp
    xlcmp_path = /usr/vac:/usr/vacpp
    xlc_c_stdinc   = /usr/vac/include:/usr/include
    xlc_cpp_stdinc = /usr/vacpp/include:/usr/include
    xlurt_msg_cat_name=vacumsg.cat
    __GNUC_MINOR__ = 3




Andreas.

Re: git broken for AIX somewhere between 2.13.2 and 2.13.3

2018-07-29 Thread Michael


On 29/07/2018 22:06, brian m. carlson wrote:

On Sun, Jul 29, 2018 at 09:48:43PM +0200, Michael wrote:

On 29/07/2018 21:27, brian m. carlson wrote:

Well, that explains it.  I would recommend submitting a patch to
https://github.com/cr-marcstevens/sha1collisiondetection, and the we can
pull in the updated submodule with that fix.

Not sure I am smart enough to do that. I'll have to download, build, and see
what it says.

The issue is that somewhere in lib/sha1.c, you need to cause
SHA1DC_BIGENDIAN to be set.  That means you need to figure out what
compiler macro might indicate that.
I remember - roughly - a few decades back - having an assignment to 
write code to determine endianness. PDP and VAC were different iirc, and 
many other micro-processors besides the 8088/8086/z85/68k/etc..


If you are looking for a compiler macro as a way to determine this - 
maybe you have one for gcc, but not for xlc. I do not know it - currently :)



I can tell you that a POWER- or
PowerPC-specific one is going to be a bad choice unless it includes the
endianness, since those chips come in little-endian versions as well.
Actually, the POWER8 and POWER9 (and I expect all future versions) 
support both. There are not separate chips. Per virtual machine - a mode 
is determined during boot (virtual power on) e.g., SLES11 only ran in 
BigEndian and SLES12 only runs in LittleEndian. afaik, RHEL was 
supplying both BE and LE distributions. AIX, as an OS, only runs in BE 
mode, and I expect IBM i (was os/400) is also only running in BE.


_AIX might be a fine choice if you know that it only ever runs on
big-endian chips.

Do you mean just testing for _AIX. That would be very very easy - yes!

In the mean time, you could build using OpenSSL or the block SHA-1
implementation, and switch back once things are in a good state.  I do
recommend using SHA1DC for things long term, though, as attacks on SHA-1
are only going to get better.

Any suggestions on where/how to do this?

root@x066:[/data/prj/aixtools/git/git-2.13.2]./configure --help | grep -i
sha
   --sharedstatedir=DIR    modifiable architecture-independent data
[PREFIX/com]
   --datarootdir=DIR   read-only arch.-independent data root
[PREFIX/share]

root@x066:[/data/prj/aixtools/git/git-2.13.2]./configure --help | grep ssl
   --with-openssl  use OpenSSL library (default is YES)
   ARG can be prefix for openssl library and headers

If you're using configure, you can use --with-openssl, or
--with-openssl=PREFIX if your OpenSSL isn't in the standard location but
is instead in PREFIX.
I'll look in configure to see if it is not finding openssl. I was 
assuming it was found - as everything else using GNU "auto" tools find 
it okay. i.e., /var/lib/libssl.a, etc..


Tomorrow!

Re: git broken for AIX somewhere between 2.13.2 and 2.13.3

2018-07-29 Thread Michael


On 29/07/2018 21:27, brian m. carlson wrote:

On Sun, Jul 29, 2018 at 08:59:39PM +0200, Michael wrote:

On 29/07/2018 20:10, brian m. carlson wrote:

Are you using SHA1DC on that system, and does compiling with another
SHA-1 implementation help?  There was a change to the SHA1DC code big
endian detection in that commit, which might be the cause of your
problems if you're using a POWER or PowerPC system.

I was thinking it might be an 'endian' issue. So, yes - AIX runs on POWER,
only as BigEndian.

Well, that explains it.  I would recommend submitting a patch to
https://github.com/cr-marcstevens/sha1collisiondetection, and the we can
pull in the updated submodule with that fix.
Not sure I am smart enough to do that. I'll have to download, build, and 
see what it says.


In the mean time, you could build using OpenSSL or the block SHA-1
implementation, and switch back once things are in a good state.  I do
recommend using SHA1DC for things long term, though, as attacks on SHA-1
are only going to get better.

Any suggestions on where/how to do this?

root@x066:[/data/prj/aixtools/git/git-2.13.2]./configure --help | grep 
-i sha
  --sharedstatedir=DIR    modifiable architecture-independent data 
[PREFIX/com]
  --datarootdir=DIR   read-only arch.-independent data root 
[PREFIX/share]


root@x066:[/data/prj/aixtools/git/git-2.13.2]./configure --help | grep ssl
  --with-openssl  use OpenSSL library (default is YES)
  ARG can be prefix for openssl library and 
headers

Re: git broken for AIX somewhere between 2.13.2 and 2.13.3

2018-07-29 Thread Michael

On 29/07/2018 20:10, brian m. carlson wrote:

On Sun, Jul 29, 2018 at 06:44:26PM +0200, Michael wrote:

root@x066:[/tmp/xxx]git --version
git version 2.13.3
root@x066:[/tmp/xxx]git clone g...@github.com:aixtools/hello-world.git
Cloning into 'hello-world'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 3
Receiving objects: 100% (3/3), done.
fatal: pack is corrupted (SHA1 mismatch)
fatal: index-pack failed

p.s. - what surprises me re: git-2.13.2 - no messages about 'Cloning into
...', which version 2.13.1 did give.

I guess a bisect is the next step - between version 2.13.2 and 2.13.3. Other
suggestions welcome!

Are you using SHA1DC on that system, and does compiling with another
SHA-1 implementation help? There was a change to the SHA1DC code big
endian detection in that commit, which might be the cause of your
problems if you're using a POWER or PowerPC system.

I was thinking it might be an 'endian' issue. So, yes - AIX runs on
POWER, only as BigEndian.

git bisect returns:

michael@x071:[/data/prj/aixtools/git/github/git-master]git bisect bad
Bisecting: 1 revision left to test after this (roughly 1 step)
[35049a2343948f686861e176a8c395f9f67da7b6] Merge branch
'aw/contrib-subtree-doc-asciidoctor' into maint

michael@x071:[/data/prj/aixtools/git/github/git-master]git bisect good
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[9936c1b52a39fa14fca04f937df3e75f7498ac66] sha1dc: update from upstream

michael@x071:[/data/prj/aixtools/git/github/git-master]git bisect bad
9936c1b52a39fa14fca04f937df3e75f7498ac66 is the first bad commit
commit 9936c1b52a39fa14fca04f937df3e75f7498ac66
Author: Ævar Arnfjörð Bjarmason
Date: Sat Jul 1 22:05:45 2017 +

sha1dc: update from upstream

Update sha1dc from the latest version by the upstream maintainer[1].

See commit 6b851e536b ("sha1dc: update from upstream", 2017-06-06) for
the last update.

This solves the Big Endian detection on Solaris reported against
v2.13.2[2], hopefully without any regressions. A version of this has
been tested on two Solaris SPARC installations, Cygwin (by jturney on
cygwin@Freenode), and on numerous more boring systems (mainly
linux/x86_64). See [3] for a discussion of the implementation and
platform-specific issues.

See commit a0103914c2 ("sha1dc: update from upstream", 2017-05-20) and
6b851e536b ("sha1dc: update from upstream", 2017-06-06) for previous
attempts in the 2.13 series to address various compile-time feature
detection in this library.

1.
https://github.com/cr-marcstevens/sha1collisiondetection/commit/19d97bf5af05312267c2e874ee6bcf584d9e9681

(https://public-inbox.org/git/CAKKM46tHq13XiW5C8sux3=pz1vhsu_npg8exfwwcpd7rkzk...@mail.gmail.com/)

3. https://github.com/cr-marcstevens/sha1collisiondetection/pull/34

Signed-off-by: Ævar Arnfjörð Bjarmason
Signed-off-by: Junio C Hamano

:04 04 a84797967fb742e4ca9618a641d53ce3a6c6589b
32efa656d78901da961e4a47d84b6d82fede064b M sha1dc

git broken for AIX somewhere between 2.13.2 and 2.13.3

2018-07-29 Thread Michael


Hi,

Several years ago I had downloaded and packaged git-2.10.1 with no real 
issues, and it has been working fine. Deciding it was time for an update 
I downloaded git-2.18.0 and built from scratch.


After testing lots of version - the the last 2.12 version worked; 
git-2.13.2 works but git-2.13.3 does not.


root@x066:[/tmp/xxx]git clone g...@github.com:aixtools/hello-world.git
root@x066:[/tmp/xxx]git --version
git version 2.13.2
root@x066:[/tmp/xxx]ls -l
total 0
drwxr-xr-x    3 root system  256 Jul 29 16:37 hello-world

root@x066:[/tmp/xxx]git --version
git version 2.13.3
root@x066:[/tmp/xxx]git clone g...@github.com:aixtools/hello-world.git
Cloning into 'hello-world'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 3
Receiving objects: 100% (3/3), done.
fatal: pack is corrupted (SHA1 mismatch)
fatal: index-pack failed

p.s. - what surprises me re: git-2.13.2 - no messages about 'Cloning 
into ...', which version 2.13.1 did give.


I guess a bisect is the next step - between version 2.13.2 and 2.13.3. 
Other suggestions welcome!


Michael

Re: [PATCH 00/17] object_id part 14

2018-07-14 Thread Michael Haggerty

On Mon, Jul 9, 2018 at 6:15 AM Derrick Stolee  wrote:
> On 7/8/2018 11:12 PM, Jacob Keller wrote:
> > On Sun, Jul 8, 2018 at 4:39 PM brian m. carlson
> >  wrote:
> >> This is the fourteenth series of patches to switch to using struct
> >> object_id and the_hash_algo.  This series converts several core pieces
> >> to use struct object_id, including the oid* and hex functions.
> >>
> >> All of these patches have been tested with both SHA-1 and a 256-bit
> >> hash.
> >>
> > I read through the series, and didn't spot anything odd, except for
> > the question about reasoning for why we use memcmp directly over using
> > hashcmp. I don't think that's any sort of blocker, it just seemed an
> > odd decision to me.
>
> I also read through the series and only found the 100/200 constants
> confusing. Not worth blocking on, but I'm CC'ing Michael Haggerty to
> comment if he knows how the magic 100 was computed.

The magic 100 blames back to our chief magician, Junio:

8ac65937d0 Make sure we do not write bogus reflog entries. (2007-01-26)

Since then, as far as I can tell, it's just been copy-pasted forward.
It would be easy to compute it precisely based on the length of the
two OIDs, represented as hex strings, plus the few extra characters in
the format string.

Michael

Re: [PATCH] xdiff: reduce indent heuristic overhead

2018-07-03 Thread Michael Haggerty

On Mon, Jul 2, 2018 at 7:27 PM Stefan Beller  wrote:
> On Sun, Jul 1, 2018 at 8:57 AM Michael Haggerty  wrote:
> [...]
> So this suggests to have MAX_BORING to be
> "hunk size + some small constant offset" ?

That would be my suggestion, yes. There are cases where it will be
more expensive than a fixed `MAX_BORING`, but I bet on average it will
be faster. Plus, it should always give the right answer.

Michael

Re: [PATCH] xdiff: reduce indent heuristic overhead

2018-07-01 Thread Michael Haggerty

On 06/29/2018 10:28 PM, Stefan Beller wrote:
> [...]
> Adds some threshold to avoid expensive cases, like:
> 
> ```
> #!python
> open('a', 'w').write(" \n" * 100)
> open('b', 'w').write(" \n" * 101)
> ```
> 
> The indent heuristic is O(N * 20) (N = 100) for the above case, and
> makes diff much slower.
> [...]
> +/*
> + * For indentation heuristic, skip searching for better slide position after
> + * checking MAX_BORING lines without finding an improvement. This defends the
> + * indentation heuristic logic against pathological cases. The value is not
> + * picked scientifically but should be good enough.
> + */
> +#define MAX_BORING 100

This is an interesting case, and a speed difference of almost a factor
of five seems impressive. But this is a pretty pathological case, isn't
it? And I'm pretty sure that the algorithm is `O(N)` both before and
after this change. Remember that to find `earliest_end` and `g.end`,
there has already been a scan through all 100 lines. In other words,
you're not improving how the overall algorithm scales with `N`; you're
only changing the constant factor in front. So it's a little bit
questionable whether it is worth complicating the code for this unusual
case.

But *if* we want to improve this case, I think that we could be smarter
about it.

By the time we get to this point in the code, we already know that there
is a "slider" hunk of length `M` (`groupsize`) that can be slid up or
down within a range of `N` (`g.end - earliest_end + 1`) possible
positions. The interesting case here is `N ≫ M`, because then naively
the number of positions to try out is a lot bigger than the size of the
hunk itself. (In the case described above, `N` is 100 and `M` is 1.)

But how can that situation even arise? Remember, a hunk can only be slid
down by a line if the first line *after* the hunk is identical to the
first line *of* the hunk. It follows that if you shift a hunk down `M`
lines, then it has the same contents as when you started—you've just
rotated all of the hunk lines around once.

So if `N ≫ M`, there is necessarily a lot of repetition among the `N +
M` lines that the hunk could possibly overlay. Specifically, it must
consist of `floor((N + M)/M)` identical copies of the hunk, plus
possibly a few leftover lines constituting the start of another repetition.

Given this large amount of repetition, it seems to me that there is
never a need to scan more than the bottom `M + 1` possible positions [1]
plus the highest possible position [2] to be sure of finding the very
best one. In the pathological case that you described above, where `M`
is 1, only three positions have to be evaluated, not 100.

In fact, it *could* be that there is even more repetition, namely if the
hunk itself contains multiple copies of an even shorter block of `K`
lines. In that case, you would only have to scan `K + 1` positions at
the bottom plus one at the top to be sure to find the best hunk
position. This would be an interesting optimization for a case like

> open('a', 'w').write(" \n" * 100)
> open('b', 'w').write(" \n" * 110)

(`N = 100`, `M = 10`, `K = 1`) or

> open('a', 'w').write("\nMISSING\n\n" * 100)
> open('b', 'w').write("\nMISSING\n\n" * 110)

(`N = 300`, `M = 30`, `K = 3`). On the other hand, it's not
entirely trivial to find periodicity in a group of lines (i.e., to find
`K`), and I don't know offhand how that task scales with `M`.

Michael

[1] Actually, to be rigorously correct it might be necessary to check
even a bit more than `M + 1` positions at the bottom because the
heuristic looks a bit beyond the lines of the hunk.

[2] The position at the top has different predecessor lines than the
other positions, so it could have a lower score than all of the others.
It's worth checking it. Here too, to be rigorously correct it might be
necessary to check more than one position at the top because the
heuristic looks a bit beyond the lines of the hunk.

[PATCH v2] filter-branch: skip commits present on --state-branch

2018-06-25 Thread Michael Barabanov

The commits in state:filter.map have already been processed, so don't
filter them again. This makes incremental git filter-branch much faster.

Also add tests for --state-branch option.

Signed-off-by: Michael Barabanov 
---
 git-filter-branch.sh |  1 +
 t/t7003-filter-branch.sh | 15 +++
 2 files changed, 16 insertions(+)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index ccceaf19a..5c5afa2b9 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -372,6 +372,7 @@ while read commit parents; do
git_filter_branch__commit_count=$(($git_filter_branch__commit_count+1))
 
report_progress
+   test -f "$workdir"/../map/$commit && continue
 
case "$filter_subdir" in
"")
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index ec4b160dd..e23de7d0b 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -107,6 +107,21 @@ test_expect_success 'test that the directory was renamed' '
test dir/D = "$(cat diroh/D.t)"
 '
 
+V=$(git rev-parse HEAD)
+
+test_expect_success 'populate --state-branch' '
+   git filter-branch --state-branch state -f --tree-filter "touch file || 
:" HEAD
+'
+
+W=$(git rev-parse HEAD)
+
+test_expect_success 'using --state-branch to skip already rewritten commits' '
+   test_when_finished git reset --hard $V &&
+   git reset --hard $V &&
+   git filter-branch --state-branch state -f --tree-filter "touch file || 
:" HEAD &&
+   test_cmp_rev $W HEAD
+'
+
 git tag oldD HEAD~4
 test_expect_success 'rewrite one branch, keeping a side branch' '
git branch modD oldD &&
-- 
2.18.0

[PATCH] filter-branch: skip commits present on --state-branch

2018-06-22 Thread Michael Barabanov

The commits in state:filter.map have already been processed, so don't
filter them again. This makes incremental git filter-branch much faster.

Also add tests for --state-branch option.

Signed-off-by: Michael Barabanov 
---
 git-filter-branch.sh |  3 +++
 t/t7003-filter-branch.sh | 15 +++
 2 files changed, 18 insertions(+)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index ccceaf19a..2df7ed107 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -372,6 +372,9 @@ while read commit parents; do
git_filter_branch__commit_count=$(($git_filter_branch__commit_count+1))
 
report_progress
+   if test -r "$workdir/../map/$commit"; then
+   continue
+   fi
 
case "$filter_subdir" in
"")
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index ec4b160dd..e23de7d0b 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -107,6 +107,21 @@ test_expect_success 'test that the directory was renamed' '
test dir/D = "$(cat diroh/D.t)"
 '
 
+V=$(git rev-parse HEAD)
+
+test_expect_success 'populate --state-branch' '
+   git filter-branch --state-branch state -f --tree-filter "touch file || 
:" HEAD
+'
+
+W=$(git rev-parse HEAD)
+
+test_expect_success 'using --state-branch to skip already rewritten commits' '
+   test_when_finished git reset --hard $V &&
+   git reset --hard $V &&
+   git filter-branch --state-branch state -f --tree-filter "touch file || 
:" HEAD &&
+   test_cmp_rev $W HEAD
+'
+
 git tag oldD HEAD~4
 test_expect_success 'rewrite one branch, keeping a side branch' '
git branch modD oldD &&
-- 
2.17.1

Re: [PATCH] t9104: kosherly remove remote refs

2018-06-02 Thread Michael Haggerty

On Fri, Jun 1, 2018 at 7:08 AM, Christian Couder
 wrote:
> As there are plans to implement other ref storage systems,
> let's use a way to remove remote refs that does not depend
> on refs being files.
>
> This makes it clear to readers that this test does not
> depend on which ref backend is used.
>
> Suggested-by: Michael Haggerty 
> Helped-by: Jeff King 
> Signed-off-by: Christian Couder 
> ---
> This was suggested and discussed in:
>
> https://public-inbox.org/git/20180525085906.ga2...@sigill.intra.peff.net/
>
>  t/t9104-git-svn-follow-parent.sh | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/t/t9104-git-svn-follow-parent.sh 
> b/t/t9104-git-svn-follow-parent.sh
> index 9c49b6c1fe..5e0ad19177 100755
> --- a/t/t9104-git-svn-follow-parent.sh
> +++ b/t/t9104-git-svn-follow-parent.sh
> @@ -215,7 +215,9 @@ test_expect_success "multi-fetch continues to work" "
> "
>
>  test_expect_success "multi-fetch works off a 'clean' repository" '
> -   rm -rf "$GIT_DIR/svn" "$GIT_DIR/refs/remotes" &&
> +   rm -rf "$GIT_DIR/svn" &&
> +   git for-each-ref --format="option no-deref%0adelete %(refname)" 
> refs/remotes |
> +   git update-ref --stdin &&
> git reflog expire --all --expire=all &&
> mkdir "$GIT_DIR/svn" &&
> git svn multi-fetch
> --
> 2.17.0.1035.g12039e008f

+1 LGTM.

Michael

Re: [PATCH 1/3] refs/packed-backend.c: close fd of empty file

2018-05-31 Thread Michael Haggerty

On Wed, May 30, 2018 at 7:03 PM, Stefan Beller  wrote:
> Signed-off-by: Stefan Beller 
> ---
>
> This was an oversight in 01caf20d57a (load_contents(): don't try to mmap an
> empty file, 2018-01-24).
>
> This and the following 2 patches apply on master.
>
>  refs/packed-backend.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index cec3fb9e00f..d447a731da0 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -499,6 +499,7 @@ static int load_contents(struct snapshot *snapshot)
> size = xsize_t(st.st_size);
>
> if (!size) {
> +   close(fd);
> return 0;
> } else if (mmap_strategy == MMAP_NONE || size <= SMALL_FILE_SIZE) {
> snapshot->buf = xmalloc(size);
> --
> 2.17.1.1185.g55be947832-goog
>

+1.

Thanks,
Michael

Re: [PATCH] t990X: use '.git/objects' as 'deep inside .git' path

2018-05-26 Thread Michael Haggerty

On Sat, May 26, 2018 at 8:47 AM, Christian Couder
<christian.cou...@gmail.com> wrote:
> Tests t9902-completion.sh and t9903-bash-prompt.sh each have tests
> that check what happens when we are "in the '.git' directory" and
> when we are "deep inside the '.git' directory".
>
> To test the case when we are "deep inside the '.git' directory" the
> test scripts used to perform a `cd .git/refs/heads`.
>
> As there are plans to implement other ref storage systems, let's
> use '.git/objects' instead of '.git/refs/heads' as the "deep inside
> the '.git' directory" path.

Seems reasonable to me. +1.

Michael

Re: [PATCH v2] t: make many tests depend less on the refs being files

2018-05-25 Thread Michael Haggerty

On Fri, May 25, 2018 at 10:59 AM, Jeff King <p...@peff.net> wrote:
> On Fri, May 25, 2018 at 10:48:04AM +0200, Michael Haggerty wrote:
>
>> >  test_expect_success "multi-fetch works off a 'clean' repository" '
>> > -   rm -r "$GIT_DIR/svn" "$GIT_DIR/refs/remotes" "$GIT_DIR/logs" &&
>> > +   rm -rf "$GIT_DIR/svn" "$GIT_DIR/refs/remotes" &&
>> > +   git reflog expire --all --expire=all &&
>> > mkdir "$GIT_DIR/svn" &&
>> > git svn multi-fetch
>> > '
>> >
>>
>> `rm -rf "$GIT_DIR/refs/remotes"` is not kosher. I think it can be written
>>
>> printf 'option no-deref\ndelete %s\n' $(git for-each-ref
>> --format='%(refname)' refs/remotes) | git update-ref --stdin
>>
>> as long as the number of references doesn't exceed command-line limits.
>> This will also take care of the reflogs. Another alternative would be to
>> write it as a loop.
>
> Perhaps:
>
>   git for-each-ref --format="option no-deref%0adelete %(refname)" 
> refs/remotes |
>   git update-ref --stdin

Ah yes, that's nicer. I tried with `\n`, but that's not supported
(wouldn't it be nice if it were?). I didn't think to try `%0a` (let
alone look in the documentation!)

Michael

Re: [PATCH v2] t: make many tests depend less on the refs being files

2018-05-25 Thread Michael Haggerty

 echo $BTIP > .git/refs/heads/B;;
> + git update-ref refs/heads/B "$BTIP";;
>   esac &&
>   git symbolic-ref HEAD refs/heads/$(echo $heads \
>   | sed -e "s/^\(.\).*$/\1/") &&
> @@ -92,8 +92,8 @@ test_expect_success 'setup' '
>   cur=$(($cur+1))
>   done &&
>   add B1 $A1 &&
> - echo $ATIP > .git/refs/heads/A &&
> - echo $BTIP > .git/refs/heads/B &&
> + git update-ref refs/heads/A "$ATIP" &&
> + git update-ref refs/heads/B "$BTIP" &&
>   git symbolic-ref HEAD refs/heads/B
>  '
>  
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index ae5a530a2d..e402aee6a2 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -63,7 +63,7 @@ test_expect_success "fetch test" '
>   git commit -a -m "updated by origin" &&
>   cd two &&
>   git fetch &&
> - test -f .git/refs/heads/one &&
> + git rev-parse --verify refs/heads/one &&
>   mine=$(git rev-parse refs/heads/one) &&
>   his=$(cd ../one && git rev-parse refs/heads/master) &&
>   test "z$mine" = "z$his"
> @@ -73,8 +73,8 @@ test_expect_success "fetch test for-merge" '
>   cd "$D" &&
>   cd three &&
>   git fetch &&
> - test -f .git/refs/heads/two &&
> - test -f .git/refs/heads/one &&
> + git rev-parse --verify refs/heads/two &&
> + git rev-parse --verify refs/heads/one &&
>   master_in_two=$(cd ../two && git rev-parse master) &&
>   one_in_two=$(cd ../two && git rev-parse one) &&
>   {
> diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh
> index 31db7b5f91..aa2d360ce3 100755
> --- a/t/t6010-merge-base.sh
> +++ b/t/t6010-merge-base.sh
> @@ -34,7 +34,7 @@ doit () {
>  
>   commit=$(echo $NAME | git commit-tree $T $PARENTS) &&
>  
> - echo $commit >.git/refs/tags/$NAME &&
> + git update-ref "refs/tags/$NAME" "$commit" &&
>   echo $commit
>  }
>  
> diff --git a/t/t7201-co.sh b/t/t7201-co.sh
> index 76c223c967..ab9da61da3 100755
> --- a/t/t7201-co.sh
> +++ b/t/t7201-co.sh
> @@ -65,7 +65,7 @@ test_expect_success setup '
>  test_expect_success "checkout from non-existing branch" '
>  
>   git checkout -b delete-me master &&
> - rm .git/refs/heads/delete-me &&
> + git update-ref -d --no-deref refs/heads/delete-me &&
>   test refs/heads/delete-me = "$(git symbolic-ref HEAD)" &&
>   git checkout master &&
>   test refs/heads/master = "$(git symbolic-ref HEAD)"
> diff --git a/t/t9104-git-svn-follow-parent.sh 
> b/t/t9104-git-svn-follow-parent.sh
> index a735fa3717..9c49b6c1fe 100755
> --- a/t/t9104-git-svn-follow-parent.sh
> +++ b/t/t9104-git-svn-follow-parent.sh
> @@ -215,7 +215,8 @@ test_expect_success "multi-fetch continues to work" "
>   "
>  
>  test_expect_success "multi-fetch works off a 'clean' repository" '
> - rm -r "$GIT_DIR/svn" "$GIT_DIR/refs/remotes" "$GIT_DIR/logs" &&
> + rm -rf "$GIT_DIR/svn" "$GIT_DIR/refs/remotes" &&
> + git reflog expire --all --expire=all &&
>   mkdir "$GIT_DIR/svn" &&
>   git svn multi-fetch
>   '
> 

`rm -rf "$GIT_DIR/refs/remotes"` is not kosher. I think it can be written

printf 'option no-deref\ndelete %s\n' $(git for-each-ref
--format='%(refname)' refs/remotes) | git update-ref --stdin

as long as the number of references doesn't exceed command-line limits.
This will also take care of the reflogs. Another alternative would be to
write it as a loop.

Michael

Re: commit-graph: change in "best" merge-base when ambiguous

2018-05-25 Thread Michael Haggerty

On 05/25/2018 12:08 AM, Jakub Narebski wrote:
> Derrick Stolee <sto...@gmail.com> writes:
>> On 5/22/2018 1:39 AM, Michael Haggerty wrote:
>>> On 05/21/2018 08:10 PM, Derrick Stolee wrote:
>>>> [...]
>>> This may be beyond the scope of what you are working on, but there are
>>> significant advantages to selecting a "best" merge base from among the
>>> candidates. Long ago [1] I proposed that the "best" merge base is the
>>> merge base candidate that minimizes the number of non-merge commits that
>>> are in
>>>
>>>  git rev-list $candidate..$branch
>>>
>>> that are already in master:
>>>
>>>  git rev-list $master
>>>
>>> (assuming merging branch into master), which is equivalent to choosing
>>> the merge base that minimizes
>>>
>>>  git rev-list --count $candidate..$branch
> 
> Is the above correct...
> 
>>> In fact, this criterion is symmetric if you exchange branch ↔ master,
>>> which is a nice property, and indeed generalizes pretty simply to
>>> computing the merge base of more than two commits.
> 
> ...as it doesn't seem to have the described symmetry.

The first email that I referenced [1] demonstrates this in the section
"Symmetry; generalization to more than two branches". The same thing is
demonstrated in a simpler way using set notation in a later email in
that thread [2].

Michael

[1] https://public-inbox.org/git/539a25bf.4060...@alum.mit.edu/
[2] https://public-inbox.org/git/53a06264.9080...@alum.mit.edu/

Re: commit-graph: change in "best" merge-base when ambiguous

2018-05-21 Thread Michael Haggerty

On 05/21/2018 08:10 PM, Derrick Stolee wrote:
> [...]
> In the Discussion section of the `git merge-base` docs [1], we have the
> following:
> 
>     When the history involves criss-cross merges, there can be more than
> one best common ancestor for two commits. For example, with this topology:
> 
>     ---1---o---A
>         \ /
>      X
>         / \
>     ---2---o---o---B
> 
>     both 1 and 2 are merge-bases of A and B. Neither one is better than
> the other (both are best merge bases). When the --all option is not
> given,     it is unspecified which best one is output.
> 
> This means our official documentation mentions that we do not have a
> concrete way to differentiate between these choices. This makes me think
> that this change in behavior is not a bug, but it _is_ a change in
> behavior. It's worth mentioning, but I don't think there is any value in
> making sure `git merge-base` returns the same output.
> 
> Does anyone disagree? Is this something we should solidify so we always
> have a "definitive" merge-base?
> [...]

This may be beyond the scope of what you are working on, but there are
significant advantages to selecting a "best" merge base from among the
candidates. Long ago [1] I proposed that the "best" merge base is the
merge base candidate that minimizes the number of non-merge commits that
are in

git rev-list $candidate..$branch

that are already in master:

git rev-list $master

(assuming merging branch into master), which is equivalent to choosing
the merge base that minimizes

git rev-list --count $candidate..$branch

In fact, this criterion is symmetric if you exchange branch ↔ master,
which is a nice property, and indeed generalizes pretty simply to
computing the merge base of more than two commits.

In that email I also included some data showing that the "best" merge
base almost always results in either the same or a shorter diff than the
more or less arbitrary algorithm that we currently use. Sometimes the
difference in diff length is dramatic.

To me it feels like the best *deterministic* merge base would be based
on the above criterion, maybe with first-parent reachability, commit
times, and SHA-1s used (in that order) to break ties.

I don't plan to work on the implementation of this idea myself (though
we've long used a script-based implementation of this algorithm
internally at GitHub).

Michael

[1] https://public-inbox.org/git/539a25bf.4060...@alum.mit.edu/
See the rest of the thread for more interesting discussion.
[2]
https://public-inbox.org/git/8a9b3f20-eed2-c59b-f7ea-3c68b3c30...@alum.mit.edu/
Higher in this thread, Junio proposes a different criterion.

Re: Implementing reftable in Git

2018-05-11 Thread Michael Haggerty

On Wed, May 9, 2018 at 4:33 PM, Christian Couder
<christian.cou...@gmail.com> wrote:
> I might start working on implementing reftable in Git soon.
> [...]

Nice. It'll be great to have a reftable implementation in git core
(and ideally libgit2, as well). It seems to me that it could someday
become the new default reference storage method. The file format is
considerably more complicated than the current loose/packed scheme,
which is definitely a disadvantage (for example, for other Git
implementations). But implementing it *with good performance and
without races* might be no more complicated than the current scheme.

Testing will be important. There are already many tests specifically
about testing loose/packed reference storage. These will always have
to run against repositories that are forced to use that reference
scheme. And there will need to be new tests specifically about the
reftable scheme. Both classes of tests should be run every time. That
much is pretty obvious.

But currently, there are a lot of tests that assume the loose/packed
reference format on disk even though the tests are not really related
to references at all. ISTM that these should be converted to work at a
higher level, for example using `for-each-ref`, `rev-parse`, etc. to
examine references rather than reading reference files directly. That
way the tests should run correctly regardless of which scheme is in
use.

And since it's too expensive to run the whole test suite with both
reference storage schemes, it seems to me that the reference storage
scheme that is used while running the scheme-neutral tests should be
easy to choose at runtime.

David Turner did some analogous work for wiring up and testing his
proposed LMDB ref storage backend that might be useful [1]. I'm CCing
him, since he might have thoughts on this topic.

Regarding the reftable spec itself:

I recently gave a little internal talk about it, and while preparing
the talk I noticed a couple of things that should maybe be tweaked:

* The spec proposes to change `$GIT_DIR/refs`, which is currently a
directory that holds the loose refs, into a file that holds the table
of contents of reftable files comprising the full set of references.
This was my suggestion. I was thinking that this would prevent old
refs code from being used accidentally on a reftable-enabled
repository, while still enabling old versions of Git recognize this as
a git directory [2]. I think that the latter is important to make
things like `git rev-parse --git-dir` work correctly, even if the
installed version of git can't actually *read* the repository.

The problem is that `is_git_directory()` checks not only whether
`$GIT_DIR/refs` exists, but also whether it is executable (i.e., since
it is normally a directory, that it is searchable). It would be silly
to make the reftable table of contents executable, so this doesn't
seem like a good approach after all.

So probably `$GIT_DIR/refs` should continue to be a directory. If
it's there, it would probably make sense to place the reftable files
and maybe the ToC inside of it. We would have to rely on older Git
versions refusing to work in the directory because its `config` file
has an unrecognized `core.repositoryFormatVersion`, but that should be
OK I think.

* The scheme for naming reftable files [3] is, I believe, just a
suggestion as far as the spec is concerned (except for the use of
`.ref`/`.log` file extensions). It might be more less unwieldy to use
`%d` rather than `%08d`, and more convenient to name compacted files
to `${min_update_index}-${max_update_index}_${n}.{ref,log}` to make it
clearer to see by inspection what each file contains. That would also
make it unnecessary, in most cases, to insert a `_${n}` to make the
filename unique.

Michael

[1] https://github.com/dturner-tw/git/tree/dturner/pluggable-backends
[2]
https://github.com/git/git/blob/ccdcbd54c4475c2238b310f7113ab3075b5abc9c/setup.c#L309-L347
[3]
https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#layout

https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#compaction
[4]
https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md#footer

Re: [PATCH] refs: handle null-oid for pseudorefs

2018-05-07 Thread Michael Haggerty

On 05/06/2018 03:35 PM, Martin Ågren wrote:
> According to the documentation on `git update-ref`, it is possible to
> "specify 40 '0' or an empty string as  to make sure that the
> ref you are creating does not exist." But in the code for pseudorefs, we
> do not implement this. If we fail to read the old ref, we immediately
> die. A failure to read would actually be a good thing if we have been
> given the null-oid.
> 
> With the null-oid, allow -- and even require -- the ref-reading to fail.
> This implements the "make sure that the ref ... does not exist" part of
> the documentation.
> 
> Since we have a `strbuf err` for collecting errors, let's use it and
> signal an error to the caller instead of dying hard.
> 
> Reported-by: Rafael Ascensão <rafa.al...@gmail.com>
> Helped-by: Rafael Ascensão <rafa.al...@gmail.com>
> Signed-off-by: Martin Ågren <martin.ag...@gmail.com>

Thanks for the patch. This looks good to me. But it it seems that the
test coverage related to pseudorefs is still not great. Ideally, all of
the following combinations should be tested:

Pre-update value   | ref-update old OID   | Expected result
---|--|
missing| missing  | accept *
missing| value| reject
set| missing  | reject *
set| correct value| accept
set| wrong value  | reject

I think your test only covers the lines with asterisks. Are the other
scenarios already covered by other tests? If not, how about adding them?
That would give us confidence that the new code works in all circumstances.

Michael

Re: cover letter cc's [was: [PATCH 60/67] hw/s390x: add include directory headers]

2018-05-04 Thread Michael S. Tsirkin

On Fri, May 04, 2018 at 08:07:53AM -0500, Eric Blake wrote:
> [adding a cross-post to the git mailing list]
> 
> On 05/04/2018 02:10 AM, Cornelia Huck wrote:
> > On Thu, 3 May 2018 22:51:40 +0300
> > "Michael S. Tsirkin" <m...@redhat.com> wrote:
> > 
> > > This way they are easier to find using standard rules.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> > > ---
> ...
> 
> > [Goes to find cover letter to figure out what this is all about.
> > *Please*, cc: people on the cover letter so they can see immediately
> > what this is trying to do!]
> 
> Is there an EASY way to make 'git format-patch --cover-letter $commitid'
> (and git send-email, by extension) automatically search for all cc's any any
> of the N/M patches, and auto-cc ALL of those recipients on the 0/N cover
> letter?  And if that is not something easily built into git format-patch
> directly, is it something that can easily be added to sendemail.cccmd?  This
> is not the first time that someone has complained that automatic cc's are
> not sending the cover letter context to a particular maintainer interested
> (and auto-cc'd) in only a subset of an overall series.
> 
> On the other hand, cc'ing all recipients for a largely mechanical patch
> series that was split into 67 parts, in part because it touches so many
> different maintainers' areas, may make the cover letter have so many
> recipients that various mail gateways start rejecting it as potential spam.

I do this sometimes (pipe to this script):

grep -e ^Signed-off-by -e ^Acked -e ^Reported -e ^Tested -e ^Cc | sed 
's/.*.*//'|sort | uniq | sed 's/^/Cc: /'


> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org

[PATCH] show: add --follow-symlinks option for :

2018-04-16 Thread Michael Vogt

Add a --follow-symlinks option that'll resolve symlinks to their
targets when the target is of the form :.

Without it, git will show the path of the link itself if the symlink
is the leaf node of , or otherwise an error if some component
of  is a symlink to another location in the repository. With
the new --follow-symlinks option both will be resolved to their
target, and its content shown instead.

Signed-off-by: Michael Vogt <m...@ubuntu.com>
---
 Documentation/git-show.txt |  7 +++
 builtin/log.c  |  6 +-
 revision.c |  2 ++
 revision.h |  1 +
 t/t1800-git-show.sh| 41 ++
 5 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100755 t/t1800-git-show.sh

diff --git a/Documentation/git-show.txt b/Documentation/git-show.txt
index e73ef5401..e2634b27e 100644
--- a/Documentation/git-show.txt
+++ b/Documentation/git-show.txt
@@ -39,6 +39,13 @@ OPTIONS
For a more complete list of ways to spell object names, see
"SPECIFYING REVISIONS" section in linkgit:gitrevisions[7].
 
+--follow-symlinks::
+   Follow symlinks inside the repository when requesting objects
+   in extended revision syntax of the form tree-ish:path-in-tree.
+   It will resolve any symlinks in  and shows the
+   content of the link if the symlink is the leaf node of
+   .
+
 include::pretty-options.txt[]
 
 
diff --git a/builtin/log.c b/builtin/log.c
index 94ee177d5..7d815b8ea 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -142,7 +142,7 @@ static void cmd_log_init_finish(int argc, const char 
**argv, const char *prefix,
 struct rev_info *rev, struct setup_revision_opt *opt)
 {
struct userformat_want w;
-   int quiet = 0, source = 0, mailmap = 0;
+   int quiet = 0, source = 0, mailmap = 0, follow_symlinks = 0;
static struct line_opt_callback_data line_cb = {NULL, NULL, 
STRING_LIST_INIT_DUP};
static struct string_list decorate_refs_exclude = 
STRING_LIST_INIT_NODUP;
static struct string_list decorate_refs_include = 
STRING_LIST_INIT_NODUP;
@@ -162,6 +162,8 @@ static void cmd_log_init_finish(int argc, const char 
**argv, const char *prefix,
OPT_CALLBACK('L', NULL, _cb, "n,m:file",
 N_("Process line range n,m in file, counting from 
1"),
 log_line_range_callback),
+   OPT_BOOL(0, "follow-symlinks", _symlinks,
+N_("follow in-tree symlinks (used when showing file 
content)")),
OPT_END()
};
 
@@ -176,6 +178,8 @@ static void cmd_log_init_finish(int argc, const char 
**argv, const char *prefix,
 
if (quiet)
rev->diffopt.output_format |= DIFF_FORMAT_NO_OUTPUT;
+   if (follow_symlinks)
+   rev->follow_symlinks = 1;
argc = setup_revisions(argc, argv, rev, opt);
 
/* Any arguments at this point are not recognized */
diff --git a/revision.c b/revision.c
index b42c836d7..4ab22313f 100644
--- a/revision.c
+++ b/revision.c
@@ -1678,6 +1678,8 @@ int handle_revision_arg(const char *arg_, struct rev_info 
*revs, int flags, unsi
 
if (revarg_opt & REVARG_COMMITTISH)
get_sha1_flags |= GET_OID_COMMITTISH;
+   if (revs && revs->follow_symlinks)
+   get_sha1_flags |= GET_OID_FOLLOW_SYMLINKS;
 
if (get_oid_with_context(arg, get_sha1_flags, , ))
return revs->ignore_missing ? 0 : -1;
diff --git a/revision.h b/revision.h
index b8c47b98e..060f1038a 100644
--- a/revision.h
+++ b/revision.h
@@ -122,6 +122,7 @@ struct rev_info {
first_parent_only:1,
line_level_traverse:1,
tree_blobs_in_commit_order:1,
+   follow_symlinks:1,
 
/* for internal use only */
exclude_promisor_objects:1;
diff --git a/t/t1800-git-show.sh b/t/t1800-git-show.sh
new file mode 100755
index 0..7a02438ec
--- /dev/null
+++ b/t/t1800-git-show.sh
@@ -0,0 +1,41 @@
+#!/bin/sh
+
+test_description='Test git show works'
+
+. ./test-lib.sh
+
+test_expect_success 'verify git show HEAD:foo works' '
+   test_commit A &&
+   git show HEAD:A.t >actual &&
+   echo A >expected &&
+   test_cmp expected actual
+'
+
+test_expect_success SYMLINKS 'verify git show HEAD:symlink shows symlink 
points to foo' '
+   ln -s A.t A.link &&
+   git add A.link &&
+   git commit -m"Added symlink to A.t" &&
+   git show HEAD:A.link >actual &&
+   printf "%s" A.t >expected &&
+   test_cmp expected actual
+'
+
+test_expect_success SYMLINKS 'verify git show --follow-symlinks HEAD:symlink 
shows foo' '
+   git show --follow-symlin

[PATCH v2] show: add --follow-symlinks option for :

2018-04-16 Thread Michael Vogt

Updated version of the `git show --follow-symlink` patch. This version
includes the feedback from Ævar Arnfjörð Bjarmason and Stefan Beller:

- commit message updated
- test fixes merged from Ævar (thanks!)
- Documentation/git-show.txt clarified

It does not include a test for --follow-symlinks in the dirlink case
when a file is behind a dirlink symlink. This this currently fails with
a non-descriptive error message. I hope to find time to improve this
error message at some point and then a test for this will be added.

It also does not include a test for a repo with symlinks when running
on a system without SYMLINKS. I don't have access to such a system,
sorry.

Re: [RFC PATCH] Add "git show --follow-symlinks HEAD:symlink"

On Fri, Apr 13, 2018 at 09:33:00PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Apr 13 2018, Michael Vogt wrote:
> 
> > The update patch is attached as an inline attachement.
> 
> Your patch still just shows up as a straight-up attachment in many
> E-Mail clients. Note the difference between what your patch
[..]
> This is why Documentation/SubmittingPatches suggests using format-patch
> & send-email. You don't *have* to use those tools, and can use something
> that's compatible with what's expected on-list, but what you're doing
> isn't that.

My apologizes, I resend it using "git send-email".

Cheers,
 Michael

[PATCH] support: git show --follow-symlinks HEAD:symlink

Add support for the `--follow-symlinks` options to git-show. This
allows to write:

git show --follow-symlink HEAD:path-a-symlink

to get the content of the symlinked file.

Signed-off-by: Michael Vogt <m...@ubuntu.com>
---
 Documentation/git-show.txt |  6 +
 builtin/log.c  |  7 --
 revision.c |  2 ++
 revision.h |  1 +
 t/t1800-git-show.sh| 46 ++
 5 files changed, 60 insertions(+), 2 deletions(-)
 create mode 100755 t/t1800-git-show.sh

diff --git a/Documentation/git-show.txt b/Documentation/git-show.txt
index e73ef5401..d9f7fd90c 100644
--- a/Documentation/git-show.txt
+++ b/Documentation/git-show.txt
@@ -39,6 +39,12 @@ OPTIONS
For a more complete list of ways to spell object names, see
"SPECIFYING REVISIONS" section in linkgit:gitrevisions[7].
 
+--follow-symlinks::
+   Follow symlinks inside the repository when requesting objects
+   in extended revision syntax of the form tree-ish:path-in-tree.
+   Instead of output about the link itself, provide output about
+   the linked-to object.
+
 include::pretty-options.txt[]
 
 
diff --git a/builtin/log.c b/builtin/log.c
index 94ee177d5..e92af4fc7 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -142,7 +142,7 @@ static void cmd_log_init_finish(int argc, const char 
**argv, const char *prefix,
 struct rev_info *rev, struct setup_revision_opt *opt)
 {
struct userformat_want w;
-   int quiet = 0, source = 0, mailmap = 0;
+   int quiet = 0, source = 0, mailmap = 0, follow_symlinks = 0;
static struct line_opt_callback_data line_cb = {NULL, NULL, 
STRING_LIST_INIT_DUP};
static struct string_list decorate_refs_exclude = 
STRING_LIST_INIT_NODUP;
static struct string_list decorate_refs_include = 
STRING_LIST_INIT_NODUP;
@@ -162,6 +162,8 @@ static void cmd_log_init_finish(int argc, const char 
**argv, const char *prefix,
OPT_CALLBACK('L', NULL, _cb, "n,m:file",
 N_("Process line range n,m in file, counting from 
1"),
 log_line_range_callback),
+   OPT_BOOL(0, "follow-symlinks", _symlinks,
+N_("follow in-tree symlinks (used when showing file 
content)")),
OPT_END()
};
 
@@ -173,9 +175,10 @@ static void cmd_log_init_finish(int argc, const char 
**argv, const char *prefix,
 builtin_log_options, builtin_log_usage,
 PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_KEEP_UNKNOWN |
 PARSE_OPT_KEEP_DASHDASH);
-
if (quiet)
rev->diffopt.output_format |= DIFF_FORMAT_NO_OUTPUT;
+   if (follow_symlinks)
+   rev->follow_symlinks = 1;
argc = setup_revisions(argc, argv, rev, opt);
 
/* Any arguments at this point are not recognized */
diff --git a/revision.c b/revision.c
index b42c836d7..4ab22313f 100644
--- a/revision.c
+++ b/revision.c
@@ -1678,6 +1678,8 @@ int handle_revision_arg(const char *arg_, struct rev_info 
*revs, int flags, unsi
 
if (revarg_opt & REVARG_COMMITTISH)
get_sha1_flags |= GET_OID_COMMITTISH;
+   if (revs && revs->follow_symlinks)
+   get_sha1_flags |= GET_OID_FOLLOW_SYMLINKS;
 
if (get_oid_with_context(arg, get_sha1_flags, , ))
return revs->ignore_missing ? 0 : -1;
diff --git a/revision.h b/revision.h
index b8c47b98e..060f1038a 100644
--- a/revision.h
+++ b/revision.h
@@ -122,6 +122,7 @@ struct rev_info {
first_parent_only:1,
line_level_traverse:1,
tree_blobs_in_commit_order:1,
+   follow_symlinks:1,
 
/* for internal use only */
exclude_promisor_objects:1;
diff --git a/t/t1800-git-show.sh b/t/t1800-git-show.sh
new file mode 100755
index 0..85541b4db
--- /dev/null
+++ b/t/t1800-git-show.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='Test git show works'
+
+. ./test-lib.sh
+
+test_expect_success 'verify git show HEAD:foo works' '
+printf "foo content\n" >foo &&
+git add foo &&
+git commit -m "added foo" &&
+git show HEAD:foo >actual &&
+printf "foo content\n" >expected &&
+test_cmp expected actual
+'
+
+test_expect_success 'verify git show HEAD:symlink shows symlink points to foo' 
'
+printf "foo content\n" >foo &&
+ln -s foo symlink &&
+git add foo symlink &&
+git commit -m "added foo and a symlink to foo" &&
+git show HEAD:foo >actual &&
+printf "foo content\n" >expected &&
+test_cmp expected actual &&

Re: [RFC PATCH] Add "git show --follow-symlinks HEAD:symlink"

On Fri, Apr 13, 2018 at 10:28:13AM -0700, Stefan Beller wrote:
> Hi Michael,
Hi Stefan,

> thanks for the patch,
Thanks for the review.

[..]
> The patch seems reasonable, apart from minor nits:
> In the test we'd prefer no whitespace on the right side of the redirection,
> i.e. echo content >foo

Sure, updated.

> Instead of evaluating git commands in shell and assigning it to a variable,
> we'd prefer to dump it to files:
[..]

Makes sense, updated.

> There is a typo &.

Ups, sorry! Fixed.

> Can we reword the documentation, such that we do not have
> an occurrence of "extended SHA-1" ?
[..]
> Maybe
> 
> Follow symlinks inside the repository when requesting
> objects in extended revision syntax of the form tree-ish:path-in-tree.

This looks very reasonable, I updated the documentation accordingly.

The update patch is attached as an inline attachement.

Cheers,
 Michael
>From dab10f5e5aea8a31cbee0ab1d5a78204c8c9832a Mon Sep 17 00:00:00 2001
From: Michael Vogt <m...@ubuntu.com>
Date: Mon, 9 Apr 2018 10:38:13 +0200
Subject: [PATCH] support: git show --follow-symlinks HEAD:symlink

Add support for the `--follow-symlinks` options to git-show. This
allows to write:

git show --follow-symlink HEAD:path-a-symlink

to get the content of the symlinked file.

Signed-off-by: Michael Vogt <m...@ubuntu.com>
---
 Documentation/git-show.txt |  6 +
 builtin/log.c  |  7 --
 revision.c |  2 ++
 revision.h |  1 +
 t/t1800-git-show.sh| 46 ++
 5 files changed, 60 insertions(+), 2 deletions(-)
 create mode 100755 t/t1800-git-show.sh

diff --git a/Documentation/git-show.txt b/Documentation/git-show.txt
index e73ef5401..d9f7fd90c 100644
--- a/Documentation/git-show.txt
+++ b/Documentation/git-show.txt
@@ -39,6 +39,12 @@ OPTIONS
 	For a more complete list of ways to spell object names, see
 	"SPECIFYING REVISIONS" section in linkgit:gitrevisions[7].
 
+--follow-symlinks::
+	Follow symlinks inside the repository when requesting objects
+	in extended revision syntax of the form tree-ish:path-in-tree.
+	Instead of output about the link itself, provide output about
+	the linked-to object.
+
 include::pretty-options.txt[]
 
 
diff --git a/builtin/log.c b/builtin/log.c
index 94ee177d5..e92af4fc7 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -142,7 +142,7 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 			 struct rev_info *rev, struct setup_revision_opt *opt)
 {
 	struct userformat_want w;
-	int quiet = 0, source = 0, mailmap = 0;
+	int quiet = 0, source = 0, mailmap = 0, follow_symlinks = 0;
 	static struct line_opt_callback_data line_cb = {NULL, NULL, STRING_LIST_INIT_DUP};
 	static struct string_list decorate_refs_exclude = STRING_LIST_INIT_NODUP;
 	static struct string_list decorate_refs_include = STRING_LIST_INIT_NODUP;
@@ -162,6 +162,8 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 		OPT_CALLBACK('L', NULL, _cb, "n,m:file",
 			 N_("Process line range n,m in file, counting from 1"),
 			 log_line_range_callback),
+		OPT_BOOL(0, "follow-symlinks", _symlinks,
+			 N_("follow in-tree symlinks (used when showing file content)")),
 		OPT_END()
 	};
 
@@ -173,9 +175,10 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 			 builtin_log_options, builtin_log_usage,
 			 PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_KEEP_UNKNOWN |
 			 PARSE_OPT_KEEP_DASHDASH);
-
 	if (quiet)
 		rev->diffopt.output_format |= DIFF_FORMAT_NO_OUTPUT;
+	if (follow_symlinks)
+		rev->follow_symlinks = 1;
 	argc = setup_revisions(argc, argv, rev, opt);
 
 	/* Any arguments at this point are not recognized */
diff --git a/revision.c b/revision.c
index b42c836d7..4ab22313f 100644
--- a/revision.c
+++ b/revision.c
@@ -1678,6 +1678,8 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 
 	if (revarg_opt & REVARG_COMMITTISH)
 		get_sha1_flags |= GET_OID_COMMITTISH;
+	if (revs && revs->follow_symlinks)
+		get_sha1_flags |= GET_OID_FOLLOW_SYMLINKS;
 
 	if (get_oid_with_context(arg, get_sha1_flags, , ))
 		return revs->ignore_missing ? 0 : -1;
diff --git a/revision.h b/revision.h
index b8c47b98e..060f1038a 100644
--- a/revision.h
+++ b/revision.h
@@ -122,6 +122,7 @@ struct rev_info {
 			first_parent_only:1,
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
+			follow_symlinks:1,
 
 			/* for internal use only */
 			exclude_promisor_objects:1;
diff --git a/t/t1800-git-show.sh b/t/t1800-git-show.sh
new file mode 100755
index 0..85541b4db
--- /dev/null
+++ b/t/t1800-git-show.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='Test git show works'
+
+. ./test-lib.sh
+
+test_expect_success 'verify git show HEAD:foo works' '
+printf "foo content\

Re: [RFC PATCH] Add "git show --follow-symlinks HEAD:symlink"

Hi Ævar,

thanks for your quick reply!

On Mon, Apr 09, 2018 at 11:28:45AM +0200, Ævar Arnfjörð Bjarmason wrote:
> On Mon, Apr 09 2018, Michael Vogt wrote:
[..]
> > Subject: [PATCH] support: git show --follow-symlinks HEAD:symlink
> >
> > Add support for the `--follow-symlinks` options to git-show. This
> > allows to write:
> >
> > git show --follow-symlink HEAD:path-a-symlink
> 
> The patch looks reasonable, but please submit it as described in
> Documentation/SubmittingPatches, i.e. inline instead of as an
> attachment, and with a signed-off-by line etc. We'd also need some tests
> for this.

Thanks for the intial reivew. I updated the patch with a test and
documentation for the new option. Happy to merge the test into one of
the existing test files, I read t/README and greping around I did not
find a place that looked like a good fit. 

I added the updated patch as an mutt inline attachment now.

Cheers,
 Michael
>From 5a9faa9eff00f316fc654c8e3bc85c3ba56ea659 Mon Sep 17 00:00:00 2001
From: Michael Vogt <m...@ubuntu.com>
Date: Mon, 9 Apr 2018 10:38:13 +0200
Subject: [PATCH] support: git show --follow-symlinks HEAD:symlink

Add support for the `--follow-symlinks` options to git-show. This
allows to write:

git show --follow-symlink HEAD:path-a-symlink

to get the content of the symlinked file.

Signed-off-by: Michael Vogt <m...@ubuntu.com>
---
 Documentation/git-show.txt |  6 ++
 builtin/log.c  |  7 +--
 revision.c |  2 ++
 revision.h |  1 +
 t/t1800-git-show.sh| 41 ++
 5 files changed, 55 insertions(+), 2 deletions(-)
 create mode 100755 t/t1800-git-show.sh

diff --git a/Documentation/git-show.txt b/Documentation/git-show.txt
index e73ef5401..fa751c35d 100644
--- a/Documentation/git-show.txt
+++ b/Documentation/git-show.txt
@@ -39,6 +39,12 @@ OPTIONS
 	For a more complete list of ways to spell object names, see
 	"SPECIFYING REVISIONS" section in linkgit:gitrevisions[7].
 
+--follow-symlinks::
+	Follow symlinks inside the repository when requesting objects
+	with extended SHA-1 expressions of the form tree-ish:path-in-tree.
+	Instead of output about the link itself, provide output about
+	the linked-to object.
+
 include::pretty-options.txt[]
 
 
diff --git a/builtin/log.c b/builtin/log.c
index 94ee177d5..e92af4fc7 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -142,7 +142,7 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 			 struct rev_info *rev, struct setup_revision_opt *opt)
 {
 	struct userformat_want w;
-	int quiet = 0, source = 0, mailmap = 0;
+	int quiet = 0, source = 0, mailmap = 0, follow_symlinks = 0;
 	static struct line_opt_callback_data line_cb = {NULL, NULL, STRING_LIST_INIT_DUP};
 	static struct string_list decorate_refs_exclude = STRING_LIST_INIT_NODUP;
 	static struct string_list decorate_refs_include = STRING_LIST_INIT_NODUP;
@@ -162,6 +162,8 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 		OPT_CALLBACK('L', NULL, _cb, "n,m:file",
 			 N_("Process line range n,m in file, counting from 1"),
 			 log_line_range_callback),
+		OPT_BOOL(0, "follow-symlinks", _symlinks,
+			 N_("follow in-tree symlinks (used when showing file content)")),
 		OPT_END()
 	};
 
@@ -173,9 +175,10 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 			 builtin_log_options, builtin_log_usage,
 			 PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_KEEP_UNKNOWN |
 			 PARSE_OPT_KEEP_DASHDASH);
-
 	if (quiet)
 		rev->diffopt.output_format |= DIFF_FORMAT_NO_OUTPUT;
+	if (follow_symlinks)
+		rev->follow_symlinks = 1;
 	argc = setup_revisions(argc, argv, rev, opt);
 
 	/* Any arguments at this point are not recognized */
diff --git a/revision.c b/revision.c
index b42c836d7..4ab22313f 100644
--- a/revision.c
+++ b/revision.c
@@ -1678,6 +1678,8 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 
 	if (revarg_opt & REVARG_COMMITTISH)
 		get_sha1_flags |= GET_OID_COMMITTISH;
+	if (revs && revs->follow_symlinks)
+		get_sha1_flags |= GET_OID_FOLLOW_SYMLINKS;
 
 	if (get_oid_with_context(arg, get_sha1_flags, , ))
 		return revs->ignore_missing ? 0 : -1;
diff --git a/revision.h b/revision.h
index b8c47b98e..060f1038a 100644
--- a/revision.h
+++ b/revision.h
@@ -122,6 +122,7 @@ struct rev_info {
 			first_parent_only:1,
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
+			follow_symlinks:1,
 
 			/* for internal use only */
 			exclude_promisor_objects:1;
diff --git a/t/t1800-git-show.sh b/t/t1800-git-show.sh
new file mode 100755
index 0..86fe8ee02
--- /dev/null
+++ b/t/t1800-git-show.sh
@@ -0,0 +1,41 @@
+#!/bin/sh
+
+test_description='Test git show works'
+
+. ./test-lib.sh
+
+test_expect_success 'verify git

Re: [PATCH 12/16] refs: store the main ref store inside the repository struct

2018-04-10 Thread Michael Haggerty

On 04/10/2018 12:45 AM, Stefan Beller wrote:
> Signed-off-by: Stefan Beller <sbel...@google.com>
> ---
>  refs.c   | 13 +
>  refs.h   |  4 +---
>  repository.h |  3 +++
>  3 files changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index f58b9fb7df..b5be754a97 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1608,9 +1608,6 @@ static struct ref_store_hash_entry 
> *alloc_ref_store_hash_entry(
>   return entry;
>  }
>  
> -/* A pointer to the ref_store for the main repository: */
> -static struct ref_store *main_ref_store;
> -
>  /* A hashmap of ref_stores, stored by submodule name: */
>  static struct hashmap submodule_ref_stores;
>  
> @@ -1652,13 +1649,13 @@ static struct ref_store *ref_store_init(const char 
> *gitdir,
>   return refs;
>  }
>  
> -struct ref_store *get_main_ref_store_the_repository(void)
> +struct ref_store *get_main_ref_store(struct repository *r)
>  {
> - if (main_ref_store)
> - return main_ref_store;
> + if (r->main_ref_store)
> + return r->main_ref_store;
>  
> - main_ref_store = ref_store_init(get_git_dir(), REF_STORE_ALL_CAPS);
> - return main_ref_store;
> + r->main_ref_store = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
> + return r->main_ref_store;
>  }
>  
>  /*
> diff --git a/refs.h b/refs.h
> index ab3d2bec2f..f5ab68c0ed 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -760,9 +760,7 @@ int reflog_expire(const char *refname, const struct 
> object_id *oid,
>  
>  int ref_storage_backend_exists(const char *name);
>  
> -#define get_main_ref_store(r) \
> - get_main_ref_store_##r()
> -struct ref_store *get_main_ref_store_the_repository(void);
> +struct ref_store *get_main_ref_store(struct repository *r);
>  /*
>   * Return the ref_store instance for the specified submodule. For the
>   * main repository, use submodule==NULL; such a call cannot fail. For
> diff --git a/repository.h b/repository.h
> index 09df94a472..7d0710b273 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -26,6 +26,9 @@ struct repository {
>*/
>   struct raw_object_store *objects;
>  
> + /* The store in which the refs are held. */
> + struct ref_store *main_ref_store;
> +
>   /*
>* Path to the repository's graft file.
>* Cannot be NULL after initialization.
> 

This also makes sense to me, as far as it goes. I have a few comments
and questions:

Why do you call the new member `main_ref_store`? Is there expected to be
some other `ref_store` associated with a repository?

I think the origin of the name `main_ref_store` was to distinguish it
from submodule ref stores. But presumably those will soon become the
"main" ref stores for their respective submodule repository objects,
right? So maybe calling things `repository.ref_store` and
`get_ref_store(repository)` would be appropriate.

There are some places in the reference code that only work with the main
repository. The ones that I can think of are:

* `ref_resolves_to_object()` depends on an object store.

* `peel_object()` and `ref_iterator_peel()` also have to look up objects
(in this case, tag objects).

* Anything that calls `files_assert_main_repository()` or depends on
`REF_STORE_MAIN` isn't implemented for other reference stores (usually,
I think, these are functions that depend on the object store).

Some of these things might be easy to generalize to non-main
repositories, but I didn't bother because AFAIK currently only the main
repository store is ever mutated.

You can move a now-obsolete comment above the definition of `struct
files_ref_store` if you haven't in some other patch (search for
"libification").

Hope that helps,
Michael

Re: [PATCH 07/16] refs: add repository argument to get_main_ref_store

2018-04-10 Thread Michael Haggerty

On 04/10/2018 12:45 AM, Stefan Beller wrote:
> Add a repository argument to allow the get_main_ref_store caller
> to be more specific about which repository to handle. This is a small
> mechanical change; it doesn't change the implementation to handle
> repositories other than the_repository yet.
> 
> As with the previous commits, use a macro to catch callers passing a
> repository other than the_repository at compile time.

This seems OK to me from a refs perspective.

The macro trick is surprising. I guess it gets you a compile-time check,
under the assumption that nothing else is called `the_repository`. But
why actually commit the macro, as opposed to compiling once locally to
check for correctness, then maybe add something like `assert(r ==
the_repository)` for the actual commit?

But I don't care either way, since the macro disappears again soon.

Michael

[RFC PATCH] Add "git show --follow-symlinks HEAD:symlink"

2018-04-09 Thread Michael Vogt

Hi,

I noticed that `git show HEAD:path-to-symlink` does not work and
returns an error like:
"fatal: Path 'debian/changelog' exists on disk, but not in 'HEAD'."

Looking at `git show` it seems there is no way right now to make
git-show show blobs if they are symlinks [1].

It would be nice to have this ability. Attached is a draft patch to
allow to write: `git show --follow-symlinks HEAD:path-to-symlink`.
Tests are missing in the patch, I'm happy to add those if there is a
chance for the feature to get in.

Cheers,
 Michael

[1] Using `git cat-file --follow-symlinks --batch < input` works but
feels a bit less elegant compared to supporting it directly in
git-show.
>From 616b7f21c057656960cb6b8a266095bbef734122 Mon Sep 17 00:00:00 2001
From: Michael Vogt <m...@ubuntu.com>
Date: Mon, 9 Apr 2018 10:38:13 +0200
Subject: [PATCH] support: git show --follow-symlinks HEAD:symlink

Add support for the `--follow-symlinks` options to git-show. This
allows to write:

git show --follow-symlink HEAD:path-a-symlink

to get the content of the symlinked file.
---
 builtin/log.c | 7 +--
 revision.c| 2 ++
 revision.h| 1 +
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 94ee177d5..e92af4fc7 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -142,7 +142,7 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 			 struct rev_info *rev, struct setup_revision_opt *opt)
 {
 	struct userformat_want w;
-	int quiet = 0, source = 0, mailmap = 0;
+	int quiet = 0, source = 0, mailmap = 0, follow_symlinks = 0;
 	static struct line_opt_callback_data line_cb = {NULL, NULL, STRING_LIST_INIT_DUP};
 	static struct string_list decorate_refs_exclude = STRING_LIST_INIT_NODUP;
 	static struct string_list decorate_refs_include = STRING_LIST_INIT_NODUP;
@@ -162,6 +162,8 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 		OPT_CALLBACK('L', NULL, _cb, "n,m:file",
 			 N_("Process line range n,m in file, counting from 1"),
 			 log_line_range_callback),
+		OPT_BOOL(0, "follow-symlinks", _symlinks,
+			 N_("follow in-tree symlinks (used when showing file content)")),
 		OPT_END()
 	};
 
@@ -173,9 +175,10 @@ static void cmd_log_init_finish(int argc, const char **argv, const char *prefix,
 			 builtin_log_options, builtin_log_usage,
 			 PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_KEEP_UNKNOWN |
 			 PARSE_OPT_KEEP_DASHDASH);
-
 	if (quiet)
 		rev->diffopt.output_format |= DIFF_FORMAT_NO_OUTPUT;
+	if (follow_symlinks)
+		rev->follow_symlinks = 1;
 	argc = setup_revisions(argc, argv, rev, opt);
 
 	/* Any arguments at this point are not recognized */
diff --git a/revision.c b/revision.c
index b42c836d7..4ab22313f 100644
--- a/revision.c
+++ b/revision.c
@@ -1678,6 +1678,8 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 
 	if (revarg_opt & REVARG_COMMITTISH)
 		get_sha1_flags |= GET_OID_COMMITTISH;
+	if (revs && revs->follow_symlinks)
+		get_sha1_flags |= GET_OID_FOLLOW_SYMLINKS;
 
 	if (get_oid_with_context(arg, get_sha1_flags, , ))
 		return revs->ignore_missing ? 0 : -1;
diff --git a/revision.h b/revision.h
index b8c47b98e..060f1038a 100644
--- a/revision.h
+++ b/revision.h
@@ -122,6 +122,7 @@ struct rev_info {
 			first_parent_only:1,
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
+			follow_symlinks:1,
 
 			/* for internal use only */
 			exclude_promisor_objects:1;
-- 
2.14.1

[RFC 1/1] packed-backend: ignore broken headers

2018-03-26 Thread Michael Haggerty

Prior to

9308b7f3ca read_packed_refs(): die if `packed-refs` contains bogus data, 
2017-07-01

we silently ignored any lines in a `packed-refs` file that we didn't
understand. That policy was clearly wrong.

But at the time, unrecognized header lines were processed by the same
code as reference and peeled lines. This means that they were also
subject to the same liberal treatment. For example, any of the
following "header" lines would have been ignored:

* "# arbitrary data that looks like a comment"
* "# pack-refs with peeled fully-peeled" ← note: missing colon
* "# pack-refs"

Loosen up the parser to ignore any first line that begins with `#` but
doesn't start with the exact character sequence "# pack-refs with:".

(In fact, the old liberal policy meant that "comment" lines would have
been ignored anywhere in the file. But the file format isn't actually
documented to allow comments, and comments make little sense in this
file, so we won't go that far.)

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
I *don't* think we actually want to merge this patch. See the cover
letter for my reasoning.

 refs/packed-backend.c | 21 +
 t/t3210-pack-refs.sh  | 33 -
 2 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 65288c6472..f9d71bb60d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -635,21 +635,18 @@ static struct snapshot *create_snapshot(struct 
packed_ref_store *refs)
 
tmp = xmemdupz(snapshot->buf, eol - snapshot->buf);
 
-   if (!skip_prefix(tmp, "# pack-refs with:", (const char **)))
-   die_invalid_line(refs->path,
-snapshot->buf,
-snapshot->eof - snapshot->buf);
+   if (skip_prefix(tmp, "# pack-refs with:", (const char **))) {
+   string_list_split_in_place(, p, ' ', -1);
 
-   string_list_split_in_place(, p, ' ', -1);
+   if (unsorted_string_list_has_string(, 
"fully-peeled"))
+   snapshot->peeled = PEELED_FULLY;
+   else if (unsorted_string_list_has_string(, 
"peeled"))
+   snapshot->peeled = PEELED_TAGS;
 
-   if (unsorted_string_list_has_string(, "fully-peeled"))
-   snapshot->peeled = PEELED_FULLY;
-   else if (unsorted_string_list_has_string(, "peeled"))
-   snapshot->peeled = PEELED_TAGS;
+   sorted = unsorted_string_list_has_string(, 
"sorted");
 
-   sorted = unsorted_string_list_has_string(, "sorted");
-
-   /* perhaps other traits later as well */
+   /* perhaps other traits later as well */
+   }
 
/* The "+ 1" is for the LF character. */
snapshot->start = eol + 1;
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index afa27ffe2d..353ef3e655 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -20,7 +20,8 @@ test_expect_success \
 'echo Hello > A &&
  git update-index --add A &&
  git commit -m "Initial commit." &&
- HEAD=$(git rev-parse --verify HEAD)'
+ HEAD=$(git rev-parse --verify HEAD) &&
+ git tag -m "A tag" annotated-tag'
 
 SHA1=
 
@@ -221,6 +222,36 @@ test_expect_success 'reject packed-refs with a short 
SHA-1' '
test_cmp expected_err err
 '
 
+test_expect_success 'handle packed-refs with a bogus header' '
+   git show-ref -d >expected &&
+   mv .git/packed-refs .git/packed-refs.bak &&
+   test_when_finished "mv .git/packed-refs.bak .git/packed-refs" &&
+   sed -e "s/^#.*/# whacked-refs/" <.git/packed-refs.bak >.git/packed-refs 
&&
+   git show-ref -d >actual 2>err &&
+   test_cmp expected actual &&
+   test_must_be_empty err
+'
+
+test_expect_success 'handle packed-refs with a truncated header' '
+   git show-ref -d >expected &&
+   mv .git/packed-refs .git/packed-refs.bak &&
+   test_when_finished "mv .git/packed-refs.bak .git/packed-refs" &&
+   sed -e "s/^#.*/# pack-refs/" <.git/packed-refs.bak >.git/packed-refs &&
+   git show-ref -d >actual 2>err &&
+   test_cmp expected actual &&
+   test_must_be_empty err
+'
+
+test_expect_success 'handle packed-refs with no traits in header' '
+   git show-ref -d >expected &&
+   mv .git/packed-refs .git/packed-refs.bak &&
+   test_when_finished "mv .git/packed-refs.bak .git/packed-refs" &&
+   sed -e "s/^#.*/# pack-refs with:/" <.git/packed-refs.bak 
>.git/packed-refs &&
+   git show-ref -d >actual 2>err &&
+   test_cmp expected actual &&
+   test_must_be_empty err
+'
+
 test_expect_success 'timeout if packed-refs.lock exists' '
LOCK=.git/packed-refs.lock &&
>"$LOCK" &&
-- 
2.14.2

[RFC 0/1] Tolerate broken headers in `packed-refs` files

2018-03-26 Thread Michael Haggerty

Prior to

9308b7f3ca read_packed_refs(): die if `packed-refs` contains bogus data, 
2017-07-01

we silently ignored pretty much any bogus data in a `packed-refs`
file. I think that was pretty clearly a bad policy. The above commit
made parsing quite a bit stricter, calling `die()` if it found any
lines that it didn't understand.

But there's one situation that is maybe not quite so clear-cut. The
first line of a `packed-refs` file can be a header that enumerates
some traits of the file containing it; for example,

# pack-refs with: peeled fully-peeled 

The old code would have tolerated lots of breakage in that line. For
example, any of the following headers would have just been ignored:

# arbitrary data that looks like a comment
# pack-refs with peeled fully-peeled  ← note: missing colon
# pack-refs

Now, any of the above lines are considered errors and cause git to
die.

In my opinion, that is a good thing and we *shouldn't* tolerate broken
header lines; i.e., the status quo is good and we *shouldn't* apply
this patch.

But there might be some tools out in the wild that have been writing
broken headers. In that case, users who upgrade Git might suddenly
find that they can't read repositories that they could read before. In
fact, a tool that we wrote and use internally at GitHub was doing
exactly that, which is how we discovered this "problem".

This patch shows what it would look like to relax the parsing again,
albeit *only* for the first line of the file, and *only* for lines
that start with '#'.

The problem with this patch is that it would make it harder for people
who implement broken tools in the future to discover their mistakes.
The only result of the error would be that it is slower to work with
the `packed-refs` files that they wrote. Such an error could go
undiscovered for a long time.

The tighter check was released quite a while ago, and AFAIK we haven't
had any bug reports from people tripped up by this consistency check.
So I'm inclined to believe that the existing tools are OK and this
patch would be counterproductive. But I wanted to share it with the
list anyway.

Michael

Michael Haggerty (1):
  packed-backend: ignore broken headers

 refs/packed-backend.c | 21 +
 t/t3210-pack-refs.sh  | 33 -
 2 files changed, 41 insertions(+), 13 deletions(-)

-- 
2.14.2

Re: [ANNOUNCE] git-sizer: compute various size-related metrics for your Git repository

2018-03-18 Thread Michael Haggerty

On Fri, Mar 16, 2018 at 10:29 PM, Jeff King <p...@peff.net> wrote:
> On Fri, Mar 16, 2018 at 09:01:42PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> One thing that can make repositories very pathological is if the ratio
>> of trees to commits is too low.
>>
>> I was dealing with a repo the other day that had several thousand files
>> all in the same root directory, and no subdirectories.
>
> We've definitely run into this problem before (CocoaPods/Specs, for
> example). The metric that would hopefully show this off is "what is the
> tree object with the most entries". Or possibly "what is the average
> number of entries in a tree object".

I find that the best metric for determining this sort of problem is
"Overall repository size -> Trees -> Total tree entries". If you have
a big directory that is being changed frequently, the *real* problem
is that every commit has to rewrite the whole tree, with all of its
many entries. So "Total tree entries" (or equivalently, the total tree
size) skyrockets. And this means that a history traversal has to
*expand* all of those trees again. So a repository that is problematic
for this reason will have a very large number of tree entries.

If you want to detect a bad repository layout like this *before* it
becomes a problem, then probably something like "average tree entries
per commit" might be a good leading indicator of a problem.

Michael

Confusing behavior with ignored submodules and `git commit -a`

2018-03-16 Thread Michael Forney

Hi,

In the past few months have noticed some confusing behavior with
ignored submodules. I finally got around to bisecting this to commit
5556808690ea245708fb80383be5c1afee2fb3eb (add, reset: ensure
submodules can be added or reset).

Here is a demonstration of the problem:

First some repository initialization with a submodule marked as ignored.

$ git init inner && git -C inner commit --allow-empty -m 'inner 1'
Initialized empty Git repository in /tmp/inner/.git/
[master (root-commit) ef55bed] inner 1
$ git init outer && cd outer
Initialized empty Git repository in /tmp/outer/.git/
$ git submodule add ../inner
Cloning into '/tmp/outer/inner'...
done.
$ echo 1 > foo.txt && git add foo.txt
$ git commit -m 'outer 1'
[master (root-commit) efeb85c] outer 1
 3 files changed, 5 insertions(+)
 create mode 100644 .gitmodules
 create mode 100644 foo.txt
 create mode 16 inner
$ git config submodule.inner.ignore all
$ git -C inner commit --allow-empty -m 'inner 2'
[master 7b7f0fa] inner 2
$ git status
On branch master
nothing to commit, working tree clean
$

Up to here is all expected. However, if I go to update `foo.txt` and
commit with `git commit -a`, changes to inner get recorded
unexpectedly. What's worse is the shortstat output of `git commit -a`,
and the diff output of `git show` give no indication that the
submodule was changed.

$ echo 2 > foo.txt
$ git status
On branch master
Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

modified:   foo.txt

no changes added to commit (use "git add" and/or "git commit -a")
$ git commit -a -m 'update foo.txt'
[master 6ec564c] update foo.txt
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git show
commit 6ec564c15ddae099c71f01750b4c434557525653 (HEAD -> master)
Author: Michael Forney <mfor...@mforney.org>
Date:   Fri Mar 16 20:18:37 2018 -0700

update foo.txt

diff --git a/foo.txt b/foo.txt
index d00491f..0cfbf08 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1 +1 @@
-1
+2
$

There have been a couple occasions where I accidentally pushed local
changes to ignored submodules because of this. Since they don't show
up in the log output, it is difficult to figure out what actually has
gone wrong.

Anyway, since the bisected commit (555680869) only mentions add and
reset, I'm assuming that this is a regression and not a deliberate
behavior change. The documentation for submodule..ignore states
that the setting should only affect `git status` and the diff family.
In terms of my expectations, I would go further and say it should only
affect `git status` and diffs against the working tree.

I took a brief look through the relevant sources, and it wasn't clear
to me how to fix this without accidentally changing the behavior of
other subcommands.

Any help with this issue is appreciated!

[ANNOUNCE] git-sizer: compute various size-related metrics for your Git repository

2018-03-16 Thread Michael Haggerty

What makes a Git repository unwieldy to work with and host? It turns
out that the respository's on-disk size in gigabytes is only part of
the story. From our experience at GitHub, repositories cause problems
because of poor internal layout at least as often as because of their
overall size. For example,

* blobs or trees that are too large
* large blobs that are modified frequently (e.g., database dumps)
* large trees that are modified frequently
* trees that expand to unreasonable size when checked out (e.g., "Git
bombs" [2])
* too many tiny Git objects
* too many references
* other oddities, such as giant octopus merges, super long reference
names or file paths, huge commit messages, etc.

`git-sizer` [1] is a new open-source tool that computes various
size-related statistics for a Git repository and points out those that
are likely to cause problems or inconvenience to its users.

I tried to make the output of `git-sizer` "opinionated" and easy to
interpret. Example output for the Linux kernel is appended below. I
also made it memory-efficient and resistant against git bombs.

I've written a blog post [3] about `git-sizer` with more explanation
and examples, and the main project page [1] has a long README with
some information about what the individual metrics mean and tips for
fixing problems.

I also put quite a bit of effort into making `git-sizer` fast. It does
its work (including figuring out path names for large objects) based
on a single traversal of the repository history using `git rev-list
--objects --reverse [...]`, followed by using the output of `git
cat-file --batch` or `git cat-file --batch-check` to get information
about individual objects.

On that subject, let me share some more technical details. `git-sizer`
is written in Go. I prototyped several ways of extracting object
information, which is critical to the performance because `git-sizer`
has to read all of the reachable non-blob objects in the repository.
The results surprised me:

| Mechanism for accessing Git data| Time   |
| --- | -: |
| `libgit2/git2go`| 25.5 s |
| `libgit2/git2go` with `ManagedTree` optimization| 18.9 s |
| `src-d/go-git`  | 63.0 s |
| Git command line client |  6.6 s |

It was almost a factor of four faster to read and parse the output of
Git plumbing commands (mainly `git for-each-ref`, `git rev-list
--objects`, `git cat-file --batch-check`, and `git cat-file --batch`)
than it was to use the Go bindings to libgit2. (I expect that part of
the reason is that Go's peculiar stack layout makes it quite expensive
to call out to C.) Even after Carlos Martin implemented an
experimental `ManagedTree` optimization that removed the need to call
C for every entry in a tree, it was still not competitive with the Git
CLI. `go-git`, which is a Git implementation in pure Go, was even
slower. So the final version of `git-sizer` calls `git` for accessing
the repository.

Feedback is welcome, including about the weightings [4] that I use to
compute the "level of concern" of the various metrics.

Have fun,
Michael

[1] https://github.com/github/git-sizer
[2] https://kate.io/blog/git-bomb/
[3] 
https://blog.github.com/2018-03-05-measuring-the-many-sizes-of-a-git-repository/
[4] 
https://github.com/github/git-sizer/blob/2e9a30f241ac357f2af01d42f0dd51fbbbae4b0b/sizes/output.go#L330-L401

$ git-sizer --verbose
Processing blobs: 1652370
Processing trees: 3396199
Processing commits: 722647
Matching commits to trees: 722647
Processing annotated tags: 534
Processing references: 539
| Name | Value | Level of concern   |
|  | - | -- |
| Overall repository size  |   ||
| * Commits|   ||
|   * Count|   723 k   | *  |
|   * Total size   |   525 MiB | ** |
| * Trees  |   ||
|   * Count|  3.40 M   | ** |
|   * Total size   |  9.00 GiB |    |
|   * Total tree entries   |   264 M   | *  |
| * Blobs  |   ||
|   * Count|  1.65 M   | *  |
|   * Total size   |  55.8 GiB | *  |
| * Annotated tags |   ||
|   * Count|   534 ||
| * References |   ||
|   * Co

Re: [git-sizer] Implications of a large commit object

2018-03-14 Thread Michael Haggerty

On Wed, Mar 14, 2018 at 9:14 AM, Lars Schneider
<larsxschnei...@gmail.com> wrote:
> I am using Michael's fantastic Git repo analyzer tool "git-sizer" [*]
> and it detected a very large commit of 7.33 MiB in my repo (see chart
> below).
>
> This large commit is expected. I've imported that repo from another
> version control system but excluded all binary files (e.g. images) and
> some 3rd party components as their history is not important [**]. I've
> reintroduced these files in the head commit again. This is where the
> large commit came from.
>
> This repo is not used in production yet but I wonder if this kind of
> approach can cause trouble down the line? Are there any relevant
> implication of a single large commit like this in history?
> [...]
>
> ###
> ## git-sizer output
>
> [...]
> | Name | Value | Level of concern   |
> |  | - | -- |
> [...]
> | Biggest objects  |   ||
> | * Commits|   ||
> |   * Maximum size [1] |  7.33 MiB | !! |
> [...]

The "commit size" that is being referred to here is the size of the
actual commit object; i.e., the author name, parent commits, etc plus
the log message. So a huge commit probably means that you have a huge
log message. This has nothing to do with the number or sizes of the
files added by the commit.

Maybe your migration tool created a huge commit message, for example
listing each of the files that was changed.

AFAIK this won't cause Git itself any problems, but it's likely to be
inconvenient. For example, when you type `git log` and 7 million
characters page by. Or when you use some GUI tool to view your history
and it performs badly because it wasn't built to handle such enormous
commit messages.

Michael

Re: [PATCH] sq_dequote: fix extra consumption of source string

2018-02-18 Thread Michael Haggerty

On Wed, Feb 14, 2018 at 12:41 AM, Jeff King <p...@peff.net> wrote:
> This fixes a (probably harmless) parsing problem in
> sq_dequote_step(), in which we parse some bogus input
> incorrectly rather than complaining that it's bogus.
> [...]

LGTM. Thanks for taking care of this.

Michael

Re: make git diff output easier to read - use better diff heuristics

2018-02-13 Thread Michael Haggerty

On Tue, Feb 13, 2018 at 7:25 PM, Σπύρος Βαζαίος <sbaza...@gmail.com> wrote:
> While I din't have the experience to express an opinion on this
> matter, I have to say that the --no-indent-heuristic that Jeff
> suggested worked great.
> There were more than a handful of cases that this issue happened in my
> diff file (all were the same: #endif followed by #ifdef).
> Oh, and the language is C indeed.

The "indent heuristic" algorithm that Git now uses by default is
nothing more than that—a heuristic—so it can be fooled. It bases its
decision on the locations of blank lines and the indentations of
non-blank lines. In the vast majority of cases it gives the same or
better results than the old algorithm, but there are some cases, like
yours, where it gives aesthetically less pleasing (though still
correct) results.

The algorithm usually handles C code well, but it tends to be confused
by preprocessor directives, because they are not indented like typical
code. It might be possible to tweak the weights to get it to handle
preprocessor directives better, but that causes it to do worse on
other, more common things like Python code (where blocks are preceded
but not followed by a line with lesser indentation).

Doing significantly better probably would require some amount of
language-awareness, but that's a bigger job than I was willing to take
on.

Michael

Windows: mintty.exe classified as exploit by AV software

2018-02-07 Thread Ehrt, Michael

Hi everyone,

a few days ago I installed version 2.16.1.2, downloaded from 
https://git-scm.com/download/win on my Windows 7 system. The OS is Windows 7 
Enterprise 64bit, Build 7601/SP1, in case it matters. This is a first time 
install, not an upgrade.

Our current virus protection software is Cylance, from 
https://www.cylance.com/en_us/home.html

During install, several executions of 
C:\Program Files\Git\usr\bin\bash.exe
were blocked, the violation being given as "Stack Pivot". Our admins then 
temporarily lifted some rules for my device so that I could properly install it.

But now, when I start ...
"C:\Program Files\Git\git-bash.exe" --cd-to-home
... Cylance classifies it as an Exploit, and blocks execution with the 
following messages:
Category: Exploit
Event: Blocked
Details: Violation: StackProtect; Application: C:\Program 
Files\Git\usr\bin\mintty.exe
(Screenshot available if needed)

If I start ...
C:\Program Files\Git\usr\bin\mintty.exe
directly, and choose the 64 bit version from the dialog, it is allowes to start 
without getting blocked.

My current problem is that the security guys don't want to see this software 
installed on my machine because of this.
And as Cylance is not a pattern-based AV, it's not something that will go away 
by waiting for the next daily update ...

Any ideas about this?

Thanks

Michael

git add --all does not respect submodule..ignore

2018-01-30 Thread Michael Scott-Nelson

Giving a submodule "ignore=all" or "ignore=dirty" in .gitmodule
successfully removes that module from `git status` queries. However,
these same diffs are automatically added by git-add, eg `git add .` or
`git add --all` adds the submodules that we want ignored. This seems
inconsistent and confusing.

Workflow reason:
We want to be able to have supers and subs checked-out, make changes
to both, but only update the SHA-1 pointer from super to sub when
explicitly forced to do so, eg `git add -f subName`. This workflow
prevents continuous merge conflicts from clashing SHA-1 pointers while
still allowing `git add --all`, allowing a sort of middle ground
between submodules and an untracked library.

Teaching git-add about submodule.ignore and/or teaching .gitignore
about submodules would be awesome.

Also experimented with `git update-index --assume-unchanged subName`,
but I believe that it does not get committed and besides also does not
seem to have a way to `git add -f`.
---
Note: currently on git version 2.14.1, but looking through the
changelogs, did not see any changes since then that would enable this
workflow.

-Michael Scott-Nelson

Re: Git For Aix 6 and 7

2018-01-28 Thread Michael Felt

I have git on my portal www.aixtools.net (See
http://www.aixtools.net/index.php/git)

These are installp packages - and what you install, you can uninstall.

There are some dependencies (e.g., python, gnu grep, and a few
others). Can't say I use it daily, but I do use it weekly.

For additional questions or issues with this packaging - please post
on http://forums.rootvg.net/aixtools - I see that a lot sooner than
any email (on gmail).

HTH,
Michael

On Thu, Jan 18, 2018 at 3:47 PM, Ævar Arnfjörð Bjarmason
<ava...@gmail.com> wrote:
>
> On Thu, Jan 18 2018, raikrishna jotted:
>
>> Hi Team,
>>
>> I have an urgent requirement to install Git client for Aix 6 and 7, could
>> you please help send me or navigate me to the correct url.
>> My present infrastructure comprise of Aix and Linux servers , I am
>> successfully using Git on Linux however I am struggling to find correct
>> package for AIX platform.
>>
>> Appreciate your quick response.
>
> Hi raikrishna. The git-packagers list is a rather small list so perhaps
> someone on the general git list (CC'd) knows the answer to this.
>
> I'm not aware of anyone providing binary git packages for AIX, but I
> don't use it so maybe they exist.
>
> The last mention on the mailing list I could find of someone packaging
> it was this from Michael Felt's (CC'd)
> https://public-inbox.org/git/canvxnixkbakgjm+nz0cyyctoeyp23kd8s4yxsquosauahjs...@mail.gmail.com/
>
> The last AIX-related patch to git is actually mine, but I haven't logged
> into an AIX box in over a decade, see
> https://github.com/chef/omnibus-software/commit/e247e36761#diff-3df898345d670979b74acc0bf71d8c47
>
> So it looks like there's a chef build recipe for it, maybe that's
> something you can use?
>
> I would not be surprised if building git on AIX, particularly with a
> non-GNU toolchain, fails in all sorts of interesting ways. People here
> on the list would be happy to help you work through those failures,
> we're keen to port git to whatever we can get our hands on, but these
> platforms experience quite a bit of bitrot.

[PATCH 4/6] packed_ref_iterator_begin(): make optimization more general

We can return an empty iterator not only if the `packed-refs` file is
missing, but also if it is empty or if there are no references whose
names succeed `prefix`. Optimize away those cases as well by moving
the call to `find_reference_location()` higher in the function and
checking whether the determined start position is the same as
`snapshot->eof`. (This is possible now because the previous commit
made `find_reference_location()` robust against empty snapshots.)

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 361affd7ad..988c45402b 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -927,7 +927,12 @@ static struct ref_iterator *packed_ref_iterator_begin(
 */
snapshot = get_snapshot(refs);
 
-   if (!snapshot->buf)
+   if (prefix && *prefix)
+   start = find_reference_location(snapshot, prefix, 0);
+   else
+   start = snapshot->start;
+
+   if (start == snapshot->eof)
return empty_ref_iterator_begin();
 
iter = xcalloc(1, sizeof(*iter));
@@ -937,11 +942,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
iter->snapshot = snapshot;
acquire_snapshot(snapshot);
 
-   if (prefix && *prefix)
-   start = find_reference_location(snapshot, prefix, 0);
-   else
-   start = snapshot->start;
-
iter->pos = start;
iter->eof = snapshot->eof;
strbuf_init(>refname_buf, 0);
-- 
2.14.2

[PATCH 5/6] load_contents(): don't try to mmap an empty file

We don't actually create zero-length `packed-refs` files, but they are
valid and we should handle them correctly. The old code `xmmap()`ed
such files, which led to an error when `munmap()` was called. So, if
the `packed-refs` file is empty, leave the snapshot at its zero values
and return 0 without trying to read or mmap the file.

Returning 0 also makes `create_snapshot()` exit early, which avoids
the technically undefined comparison `NULL < NULL`.

Reported-by: Kim Gybels <kgyb...@infogroep.be>
Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 988c45402b..e829cf206d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -461,7 +461,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
- * existed and was read, or 0 if the file was absent. Die on errors.
+ * existed and was read, or 0 if the file was absent or empty. Die on
+ * errors.
  */
 static int load_contents(struct snapshot *snapshot)
 {
@@ -492,19 +493,17 @@ static int load_contents(struct snapshot *snapshot)
die_errno("couldn't stat %s", snapshot->refs->path);
size = xsize_t(st.st_size);
 
-   switch (mmap_strategy) {
-   case MMAP_NONE:
+   if (!size) {
+   return 0;
+   } else if (mmap_strategy == MMAP_NONE) {
snapshot->buf = xmalloc(size);
bytes_read = read_in_full(fd, snapshot->buf, size);
if (bytes_read < 0 || bytes_read != size)
die_errno("couldn't read %s", snapshot->refs->path);
snapshot->mmapped = 0;
-   break;
-   case MMAP_TEMPORARY:
-   case MMAP_OK:
+   } else {
snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 
0);
snapshot->mmapped = 1;
-   break;
}
close(fd);
 
-- 
2.14.2

[PATCH 0/6] Yet another approach to handling empty snapshots

This patch series fixes the handling of empty packed-refs snapshots
(i.e., those with `snapshot->buf` and friends equal to `NULL`), partly
by changing `snapshot` to store a pointer to the start of the
post-header `packed-refs` content instead of `header_len`. It makes a
couple of other improvements as well.

I'm not sure whether I like this approach better than the alternative
of always setting `snapshot->buf` to a non-NULL value, by allocating a
length-1 bit of RAM if necessary. The latter is less intrusive, though
even if that approach is taken, I think patches 01, 02, and 04 from
this patch series would be worthwhile improvements.

Michael

Kim Gybels (1):
  packed_ref_cache: don't use mmap() for small files

Michael Haggerty (5):
  struct snapshot: store `start` rather than `header_len`
  create_snapshot(): use `xmemdupz()` rather than a strbuf
  find_reference_location(): make function safe for empty snapshots
  packed_ref_iterator_begin(): make optimization more general
  load_contents(): don't try to mmap an empty file

 refs/packed-backend.c | 106 ++
 1 file changed, 55 insertions(+), 51 deletions(-)

-- 
2.14.2

[PATCH 2/6] create_snapshot(): use `xmemdupz()` rather than a strbuf

It's lighter weight.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b872267f02..08698de6ea 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -620,8 +620,7 @@ static struct snapshot *create_snapshot(struct 
packed_ref_store *refs)
 
/* If the file has a header line, process it: */
if (snapshot->buf < snapshot->eof && *snapshot->buf == '#') {
-   struct strbuf tmp = STRBUF_INIT;
-   char *p, *eol;
+   char *tmp, *p, *eol;
struct string_list traits = STRING_LIST_INIT_NODUP;
 
eol = memchr(snapshot->buf, '\n',
@@ -631,9 +630,9 @@ static struct snapshot *create_snapshot(struct 
packed_ref_store *refs)
  snapshot->buf,
  snapshot->eof - snapshot->buf);
 
-   strbuf_add(, snapshot->buf, eol - snapshot->buf);
+   tmp = xmemdupz(snapshot->buf, eol - snapshot->buf);
 
-   if (!skip_prefix(tmp.buf, "# pack-refs with:", (const char 
**)))
+   if (!skip_prefix(tmp, "# pack-refs with:", (const char **)))
die_invalid_line(refs->path,
 snapshot->buf,
 snapshot->eof - snapshot->buf);
@@ -653,7 +652,7 @@ static struct snapshot *create_snapshot(struct 
packed_ref_store *refs)
snapshot->start = eol + 1;
 
string_list_clear(, 0);
-   strbuf_release();
+   free(tmp);
}
 
verify_buffer_safe(snapshot);
-- 
2.14.2

[PATCH 6/6] packed_ref_cache: don't use mmap() for small files

From: Kim Gybels <kgyb...@infogroep.be>

Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
small files, 2010-02-21) and use read() instead of mmap() for small
packed-refs files.

Signed-off-by: Kim Gybels <kgyb...@infogroep.be>
Signed-off-by: Junio C Hamano <gits...@pobox.com>
Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index e829cf206d..8b4b45da67 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -458,6 +458,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 last_line, eof - last_line);
 }
 
+#define SMALL_FILE_SIZE (32*1024)
+
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
@@ -495,7 +497,7 @@ static int load_contents(struct snapshot *snapshot)
 
if (!size) {
return 0;
-   } else if (mmap_strategy == MMAP_NONE) {
+   } else if (mmap_strategy == MMAP_NONE || size <= SMALL_FILE_SIZE) {
snapshot->buf = xmalloc(size);
bytes_read = read_in_full(fd, snapshot->buf, size);
if (bytes_read < 0 || bytes_read != size)
-- 
2.14.2

[PATCH 3/6] find_reference_location(): make function safe for empty snapshots

This function had two problems if called for an empty snapshot (i.e.,
`snapshot->start == snapshot->eof == NULL`):

* It checked `NULL < NULL`, which is undefined by C (albeit highly
  unlikely to fail in the real world).

* (Assuming the above comparison behaved as expected), it returned
  NULL when `mustexist` was false, contrary to its docstring.

Change the check and fix the docstring.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 08698de6ea..361affd7ad 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -519,9 +519,11 @@ static int load_contents(struct snapshot *snapshot)
  * `refname` starts. If `mustexist` is true and the reference doesn't
  * exist, then return NULL. If `mustexist` is false and the reference
  * doesn't exist, then return the point where that reference would be
- * inserted. In the latter mode, `refname` doesn't have to be a proper
- * reference name; for example, one could search for "refs/replace/"
- * to find the start of any replace references.
+ * inserted, or `snapshot->eof` (which might be NULL) if it would be
+ * inserted at the end of the file. In the latter mode, `refname`
+ * doesn't have to be a proper reference name; for example, one could
+ * search for "refs/replace/" to find the start of any replace
+ * references.
  *
  * The record is sought using a binary search, so `snapshot->buf` must
  * be sorted.
@@ -551,7 +553,7 @@ static const char *find_reference_location(struct snapshot 
*snapshot,
 */
const char *hi = snapshot->eof;
 
-   while (lo < hi) {
+   while (lo != hi) {
const char *mid, *rec;
int cmp;
 
-- 
2.14.2

[PATCH 1/6] struct snapshot: store `start` rather than `header_len`

Store a pointer to the start of the actual references within the
`packed-refs` contents rather than storing the length of the header.
This is more convenient for most users of this field.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 64 ++-
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 023243fd5f..b872267f02 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -68,17 +68,21 @@ struct snapshot {
int mmapped;
 
/*
-* The contents of the `packed-refs` file. If the file was
-* already sorted, this points at the mmapped contents of the
-* file. If not, this points at heap-allocated memory
-* containing the contents, sorted. If there were no contents
-* (e.g., because the file didn't exist), `buf` and `eof` are
-* both NULL.
+* The contents of the `packed-refs` file:
+*
+* - buf -- a pointer to the start of the memory
+* - start -- a pointer to the first byte of actual references
+ *   (i.e., after the header line, if one is present)
+* - eof -- a pointer just past the end of the reference
+ *   contents
+*
+* If the `packed-refs` file was already sorted, `buf` points
+* at the mmapped contents of the file. If not, it points at
+* heap-allocated memory containing the contents, sorted. If
+* there were no contents (e.g., because the file didn't
+* exist), `buf`, `start`, and `eof` are all NULL.
 */
-   char *buf, *eof;
-
-   /* The size of the header line, if any; otherwise, 0: */
-   size_t header_len;
+   char *buf, *start, *eof;
 
/*
 * What is the peeled state of the `packed-refs` file that
@@ -169,8 +173,7 @@ static void clear_snapshot_buffer(struct snapshot *snapshot)
} else {
free(snapshot->buf);
}
-   snapshot->buf = snapshot->eof = NULL;
-   snapshot->header_len = 0;
+   snapshot->buf = snapshot->start = snapshot->eof = NULL;
 }
 
 /*
@@ -319,13 +322,14 @@ static void sort_snapshot(struct snapshot *snapshot)
size_t len, i;
char *new_buffer, *dst;
 
-   pos = snapshot->buf + snapshot->header_len;
+   pos = snapshot->start;
eof = snapshot->eof;
-   len = eof - pos;
 
-   if (!len)
+   if (pos == eof)
return;
 
+   len = eof - pos;
+
/*
 * Initialize records based on a crude estimate of the number
 * of references in the file (we'll grow it below if needed):
@@ -391,9 +395,8 @@ static void sort_snapshot(struct snapshot *snapshot)
 * place:
 */
clear_snapshot_buffer(snapshot);
-   snapshot->buf = new_buffer;
+   snapshot->buf = snapshot->start = new_buffer;
snapshot->eof = new_buffer + len;
-   snapshot->header_len = 0;
 
 cleanup:
free(records);
@@ -442,14 +445,14 @@ static const char *find_end_of_record(const char *p, 
const char *end)
  */
 static void verify_buffer_safe(struct snapshot *snapshot)
 {
-   const char *buf = snapshot->buf + snapshot->header_len;
+   const char *start = snapshot->start;
const char *eof = snapshot->eof;
const char *last_line;
 
-   if (buf == eof)
+   if (start == eof)
return;
 
-   last_line = find_start_of_record(buf, eof - 1);
+   last_line = find_start_of_record(start, eof - 1);
if (*(eof - 1) != '\n' || eof - last_line < GIT_SHA1_HEXSZ + 2)
die_invalid_line(snapshot->refs->path,
 last_line, eof - last_line);
@@ -495,18 +498,19 @@ static int load_contents(struct snapshot *snapshot)
bytes_read = read_in_full(fd, snapshot->buf, size);
if (bytes_read < 0 || bytes_read != size)
die_errno("couldn't read %s", snapshot->refs->path);
-   snapshot->eof = snapshot->buf + size;
snapshot->mmapped = 0;
break;
case MMAP_TEMPORARY:
case MMAP_OK:
snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 
0);
-   snapshot->eof = snapshot->buf + size;
snapshot->mmapped = 1;
break;
}
close(fd);
 
+   snapshot->start = snapshot->buf;
+   snapshot->eof = snapshot->buf + size;
+
return 1;
 }
 
@@ -539,7 +543,7 @@ static const char *find_reference_location(struct snapshot 
*snapshot,
 * preceding records all have reference names that come
 * *before* `refname`.
 */
-   const char *lo = snapshot->buf + snapshot->header_len;
+   const char *lo =

Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files

On 01/22/2018 08:31 PM, Junio C Hamano wrote:
> Michael Haggerty <mhag...@alum.mit.edu> writes:
> 
>> `snapshot->buf` can still be NULL if the `packed-refs` file didn't exist
>> (see the earlier code path in `load_contents()`). So either that code
>> path *also* has to get the `xmalloc()` treatment, or my third patch is
>> still necessary. (My second patch wouldn't be necessary because the
>> ENOENT case makes `load_contents()` return 0, triggering the early exit
>> from `create_snapshot()`.)
>>
>> I don't have a strong preference either way.
> 
> Which would be a two-liner, like the attached, which does not look
> too bad by itself.
> 
> The direction, if we take this approach, means that we are declaring
> that .buf being NULL is an invalid state for a snapshot to be in,
> instead of saying "an empty snapshot looks exactly like one that was
> freshly initialized", which seems to be the intention of the original
> design.
> 
> After Kim's fix and with 3/3 in your follow-up series, various
> helpers are still unsafe against .buf being NULL, like
> sort_snapshot(), verify_buffer_safe(), clear_snapshot_buffer() (only
> when mmapped bit is set), find_reference_location().
> 
> packed_ref_iterator_begin() checks if snapshot->buf is NULL and
> returns early.  At the first glance, this appears a useful short cut
> to optimize the empty case away, but the check also is acting as a
> guard to prevent a snapshot with NULL .buf from being fed to an
> unsafe find_reference_location().  An implicit guard like this feels
> a bit more brittle than my liking.  If we ensure .buf is never NULL,
> that check can become a pure short-cut optimization and stop being
> a correctness thing.
> 
> So...
> 
> 
>  refs/packed-backend.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index b6e2bc3c1d..1eeb5c7f80 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -473,12 +473,11 @@ static int load_contents(struct snapshot *snapshot)
>   if (fd < 0) {
>   if (errno == ENOENT) {
>   /*
> -  * This is OK; it just means that no
> -  * "packed-refs" file has been written yet,
> -  * which is equivalent to it being empty,
> -  * which is its state when initialized with
> -  * zeros.
> +  * Treat missing "packed-refs" as equivalent to
> +  * it being empty.
>*/
> + snapshot->eof = snapshot->buf = xmalloc(0);
> + snapshot->mmapped = 0;
>   return 0;
>   } else {
>   die_errno("couldn't read %s", snapshot->refs->path);
> 

That would work, though if you go this way, please also change the
docstring for `snapshot::buf`, which still says that `buf` and `eof` can
be `NULL`.

The other alternative, making `snapshot` safe for NULLs, becomes easier
if `snapshot` stores a pointer to the start of the reference section of
the `packed-refs` contents (i.e., after the header line), rather than
repeatedly computing that address from `snapshot->buf +
snapshot->header_len`. With this change, code that is technically
undefined when the fields are NULL can more easily be replaced with code
that is safe for NULL. For example,

pos = snapshot->buf + snapshot->header_len

becomes

pos = snapshot->start

, and

len = snapshot->eof - pos;
if (!len) [...]

becomes

if (pos == snapshot->eof) [...]
len = snapshot->eof - pos;

. In this way, most of the special-casing for NULL goes away (and some
code becomes simpler, as well).

In a moment I'll send a patch series illustrating this approach. I think
patches 01, 02, and 04 are improvements regardless of whether we decide
to make NULL safe.

The change to using `read()` rather than `mmap()` for small
`packed-refs` feels like it should be an improvement, but it occurred to
me that the performance numbers quoted in ea68b0ce9f8 (hash-object:
don't use mmap() for small files, 2010-02-21) are not directly
applicable to the `packed-refs` file. As far as I understand, the file
mmapped in `index_fd()` is always read in full, whereas the main point
of mmapping the packed-refs file is to avoid having to read the whole
file at all in some situations. That being said, a 32 KiB file would
only be 8 pages (assuming a page size of 4 KiB), and by the time you've
read the header and binary-searched to find the desired record, you've
probably paged in most of the file anyway. Reading the whole file at
once, in order, is almost certainly cheaper.

Michael

Re: [PATCH] files_initial_transaction_commit(): only unlock if locked

2018-01-22 Thread Michael Haggerty

On 01/19/2018 11:14 PM, Junio C Hamano wrote:
> Jeff King <p...@peff.net> writes:
> 
>> On Thu, Jan 18, 2018 at 02:38:41PM +0100, Mathias Rav wrote:
>>
>>> Running git clone --single-branch --mirror -b TAGNAME previously
>>> triggered the following error message:
>>>
>>> fatal: multiple updates for ref 'refs/tags/TAGNAME' not allowed.
>>>
>>> This error condition is handled in files_initial_transaction_commit().
>>>
>>> 42c7f7ff9 ("commit_packed_refs(): remove call to `packed_refs_unlock()`", 
>>> 2017-06-23)
>>> introduced incorrect unlocking in the error path of this function,
>>> which changes the error message to
>>>
>>> fatal: BUG: packed_refs_unlock() called when not locked
>>>
>>> Move the call to packed_refs_unlock() above the "cleanup:" label
>>> since the unlocking should only be done in the last error path.
>>
>> Thanks, this solution looks correct to me. It's pretty low-impact since
>> the locking is the second-to-last thing in the function, so we don't
>> have to re-add the unlock to a bunch of error code paths. But one
>> alternative would be to just do:
>>
>>   if (packed_refs_is_locked(refs))
>>  packed_refs_unlock(refs->packed_ref_store);
>>
>> in the cleanup section.
> 
> Yeah, that may be a more future-proof alternative, and just as you
> said the patch as posted would be sufficient, too.

Either solution LGTM. Thanks for finding and fixing this bug.

But let's also take a step back. The invocation

git clone --single-branch --mirror -b TAGNAME

seems curious. Does it even make sense to use `--mirror` and
`--single-branch` at the same time? What should it do?

Normally `--mirror` implies (aside from `--bare`) that the remote
references should be converted 1:1 to local references and should be
overwritten at every fetch; i.e., the refspec should be set to
`+refs/*:refs/*`.

To me the most plausible interpretation of `--mirror --single-branch -b
BRANCHNAME` would be that the single branch should be fetched and made
the HEAD, and the refspec should be set to
`+refs/heads/BRANCHNAME:refs/heads/BRANCHNAME`. It also wouldn't be very
surprising if it were forbidden to use these options together.

Currently, we do neither of those things. Instead we fetch that one
reference (as `refs/heads/BRANCHNAME`) but set the refspec to
`+refs/*:refs/*`; i.e., the next fetch would fetch all of the history.

It's even more mind-bending if `-b` is passed a `TAGNAME` rather than a
`BRANCHNAME`. The documentation says that `-b TAGNAME` "detaches the
HEAD at that commit in the resulting repository". If `--single-branch -b
TAGNAME` is used, then the refspec is set to
`+refs/tags/TAGNAME:refs/tags/TAGNAME`. But what if `--mirror` is also used?

Currently, this fails, apparently because `--mirror` and `-b TAGNAME`
each independently try to set `refs/tags/TAGNAME` (presumably to the
same value). *If* this is a useful use case, we could fix it so that it
doesn't fail. If not, maybe we should prohibit it explicitly and emit a
clearer error message.

Mathias: if you encountered this problem in the real world, what were
you trying to accomplish? What behavior would you have expected?

Maybe the behavior could be made more sane if there were a way to get
the 1:1 reference mapping that `--mirror` implies without also getting
`--bare` [1]. Suppose there were a `--refspec` option. Then instead of

git clone --mirror --single-branch -b BRANCHNAME

with it's non-obvious semantics, you could prohibit that use and instead
support

git clone --bare
--refspec='+refs/heads/BRANCHNAME:refs/heads/BRANCHNAME'

which seems clearer in its intent, if perhaps not super obvious. Or you
could give `clone` a `--no-fetch` option, which would give the user a
time to intervene between setting up the basic clone config and actually
fetching objects.

Michael

[1] It seems like

git clone --config remote.origin.fetch='+refs/*:refs/*' clone ...

might do it, but that actually ends up setting up two refspecs and
only honoring `+refs/heads/*:refs/remotes/origin/*` for the initial
fetch. Plus it is pretty obscure.

Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files

2018-01-20 Thread Michael Haggerty

On 01/17/2018 11:09 PM, Jeff King wrote:
> On Tue, Jan 16, 2018 at 08:38:15PM +0100, Kim Gybels wrote:
> 
>> Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
>> small files, 2010-02-21) and use read() instead of mmap() for small
>> packed-refs files.
>>
>> This also fixes the problem[1] where xmmap() returns NULL for zero
>> length[2], for which munmap() later fails.
>>
>> Alternatively, we could simply check for NULL before munmap(), or
>> introduce xmunmap() that could be used together with xmmap(). However,
>> always setting snapshot->buf to a valid pointer, by relying on
>> xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots
>> easier.
>>
>> [1] https://github.com/git-for-windows/git/issues/1410
>> [2] Logic introduced in commit 9130ac1e196 (Better error messages for
>> corrupt databases, 2007-01-11)
>>
>> Signed-off-by: Kim Gybels <kgyb...@infogroep.be>
>> ---
>>
>> Change since v2: removed separate case for zero length as suggested by Peff,
>> ensuring that snapshot->buf is always a valid pointer.
> 
> Thanks, this looks fine to me (I'd be curious to hear from Michael if
> this eliminates the need for the other patches).

`snapshot->buf` can still be NULL if the `packed-refs` file didn't exist
(see the earlier code path in `load_contents()`). So either that code
path *also* has to get the `xmalloc()` treatment, or my third patch is
still necessary. (My second patch wouldn't be necessary because the
ENOENT case makes `load_contents()` return 0, triggering the early exit
from `create_snapshot()`.)

I don't have a strong preference either way.

Michael

Re: [BUG] git remote prune removes local tags, depending on fetch config

2018-01-15 Thread Michael Giuffrida

Just to confirm, you're talking about the behavior of removing *all*
tags that aren't found on the remote? (Not just tags that used to be
on some remote, but have since been removed from that remote.) So,
with your proposed workflow, you would never create your own tags
locally, without pushing them to the remote before running `git fetch`
-- otherwise they would simply be deleted.

It doesn't seem like a useful feature -- you wouldn't expect `git
fetch --prune` to remove your local branches that you were developing
on, right?

On Mon, Jan 15, 2018 at 4:38 PM, Ævar Arnfjörð Bjarmason
<ava...@gmail.com> wrote:
>
> On Mon, Jan 15 2018, Michael Giuffrida jotted:
>
>> `git remote prune ` should "delete all stale remote-tracking
>> branches under ". I was surprised to discover, after some
>> troubleshooting, that it also deletes *all* local tags that don't
>> exist on the remote, if the following refspec is included in the
>> remote's fetch config:
>>
>> +refs/tags/*:refs/tags/*
>>
>> So, if `remote.origin.fetch` is configured to fetch all tags from the
>> remote, any tags I create locally will be deleted when running `git
>> remote prune origin`. This is not intuitive [1], nor is is it
>> explained in the docs [2]. Is this behavior obvious to someone with a
>> better understanding of Git internals?
>>
>> I did find a better way to automatically fetch tags (using tagopt
>> instead of adding the fetch refspec). However, the refspec doesn't
>> seem "wrong" in itself; in particular, `git fetch --tags` used to be
>> considered equivalent to specifying the refspec
>> "refs/tags/*:refs/tags/*" -- implying that this is a sensible refspec
>> [3]. So I wouldn't expect it to "break" the behavior of another
>> command.
>>
>> [1] https://stackoverflow.com/q/34687657/1327867
>> [2] https://git-scm.com/docs/git-remote.html#git-remote-empruneem
>> [3] 
>> https://github.com/git/git/commit/c5a84e92a2fe9e8748e32341c344d7a6c0f52a50
>
> These docs are really confusing, but it is working as intended, and
> really should be re-documented.
>
> The `git remote prune` subcommand just ends up piggy-backing on
> git-fetch, whose behavior is explained here:
> https://git-scm.com/docs/git-fetch.html#git-fetch---prune
>
> It's worked this way since at least v1.8.5.6, maybe at some distant
> point in the past it only did this for branches when invoked via
> git-remote as the documentation says.
>
> RELATED:
>
> I've actually had the reverse problem with this. I want some way to turn
> this behavior on without explicitly hacking the refspec, so I can do it
> globally in /etc/gitconfig or in ~/.gitconfig without screwing with the
> config of each checkout on certain machines.
>
> You can set fetch.prune=true, but that only prunes the branches, you
> need to inject remote.origin.fetch into each checkout, unless I've
> missed some way of doing this.
>
> I wanted to add fetch.pruneTags that would make it as if you had
> refs/tags/*:refs/tags/* in the fetch spec, but I haven't hacked that up
> yet, if anyone can see any inherent issue with that plan I'd like to
> know about it.

[BUG] git remote prune removes local tags, depending on fetch config

2018-01-15 Thread Michael Giuffrida

`git remote prune ` should "delete all stale remote-tracking
branches under ". I was surprised to discover, after some
troubleshooting, that it also deletes *all* local tags that don't
exist on the remote, if the following refspec is included in the
remote's fetch config:

+refs/tags/*:refs/tags/*

So, if `remote.origin.fetch` is configured to fetch all tags from the
remote, any tags I create locally will be deleted when running `git
remote prune origin`. This is not intuitive [1], nor is is it
explained in the docs [2]. Is this behavior obvious to someone with a
better understanding of Git internals?

I did find a better way to automatically fetch tags (using tagopt
instead of adding the fetch refspec). However, the refspec doesn't
seem "wrong" in itself; in particular, `git fetch --tags` used to be
considered equivalent to specifying the refspec
"refs/tags/*:refs/tags/*" -- implying that this is a sensible refspec
[3]. So I wouldn't expect it to "break" the behavior of another
command.

[1] https://stackoverflow.com/q/34687657/1327867
[2] https://git-scm.com/docs/git-remote.html#git-remote-empruneem
[3] https://github.com/git/git/commit/c5a84e92a2fe9e8748e32341c344d7a6c0f52a50

[PATCH 2/3] create_snapshot(): exit early if the file was empty

If the `packed_refs` files is entirely empty (i.e., not even a header
line), then `load_contents()` returns 1 even though `snapshot->buf`
and `snapshot->eof` both end up set to NULL. In that case, the
subsequent processing that `create_snapshot()` does is unnecessary,
and also involves computing `NULL - NULL` and `NULL + 0`, which are
probably safe in real life but are technically undefined in C.

Sidestep both issues by exiting early if `snapshot->buf` is NULL.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index f20f05b4df..36796d65f0 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -613,7 +613,7 @@ static struct snapshot *create_snapshot(struct 
packed_ref_store *refs)
acquire_snapshot(snapshot);
snapshot->peeled = PEELED_NONE;
 
-   if (!load_contents(snapshot))
+   if (!load_contents(snapshot) || !snapshot->buf)
return snapshot;
 
/* If the file has a header line, process it: */
-- 
2.14.2

[PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL

If `snapshot->buf` is NULL, then `find_reference_location()` has two
problems:

1. It relies on behavior that is technically undefined in C, such as
   computing `NULL + 0`.

2. It returns NULL if the reference doesn't exist, even if `mustexist`
   is not set. This problem doesn't come up in the current code,
   because we never call this function with `snapshot->buf == NULL`
   and `mustexist` set. But it is something that future callers need
   to be aware of.

We could fix the first problem by adding some extra logic to the
function. But considering both problems together, it is more
straightforward to document that the function should only be called if
`snapshot->buf` is non-NULL.

Adjust `packed_read_raw_ref()` to return early if `snapshot->buf` is
NULL rather than calling `find_reference_location()`.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 36796d65f0..ed2b396bef 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -521,8 +521,9 @@ static int load_contents(struct snapshot *snapshot)
  * reference name; for example, one could search for "refs/replace/"
  * to find the start of any replace references.
  *
+ * This function must only be called if `snapshot->buf` is non-NULL.
  * The record is sought using a binary search, so `snapshot->buf` must
- * be sorted.
+ * also be sorted.
  */
 static const char *find_reference_location(struct snapshot *snapshot,
   const char *refname, int mustexist)
@@ -728,6 +729,12 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
 
*type = 0;
 
+   if (!snapshot->buf) {
+   /* There are no packed references */
+   errno = ENOENT;
+   return -1;
+   }
+
rec = find_reference_location(snapshot, refname, 1);
 
if (!rec) {
-- 
2.14.2

[PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 01a13cb817..f20f05b4df 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -69,11 +69,11 @@ struct snapshot {
 
/*
 * The contents of the `packed-refs` file. If the file was
-* already sorted, this points at the mmapped contents of the
-* file. If not, this points at heap-allocated memory
-* containing the contents, sorted. If there were no contents
-* (e.g., because the file didn't exist), `buf` and `eof` are
-* both NULL.
+* already sorted and if mmapping is allowed, this points at
+* the mmapped contents of the file. If not, this points at
+* heap-allocated memory containing the contents, sorted. If
+* there were no contents (e.g., because the file didn't exist
+* or was empty), `buf` and `eof` are both NULL.
 */
char *buf, *eof;
 
-- 
2.14.2

[PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files"

Thanks for your patch. I haven't measured the performance difference
of `mmap()` vs. `read()` for small `packed-refs` files, but it's not
surprising that `read()` would be faster.

I especially like the fix for zero-length `packed-refs` files. (Even
though AFAIK Git never writes such files, they are totally legitimate
and shouldn't cause Git to fail.) With or without the additions
mentioned below,

Reviewed-by: Michael Haggerty <mhag...@alum.mit.edu>

While reviewing your patch, I realized that some areas of the existing
code use constructs that are undefined according to the C standard,
such as computing `NULL + 0` and `NULL - NULL`. This was already wrong
(and would come up more frequently after your change). Even though
these are unlikely to be problems in the real world, it would be good
to avoid them.

So I will follow up this email with three patches:

1. Mention that `snapshot::buf` can be NULL for empty files

   I suggest squashing this into your patch, to make it clear that
   `snapshot::buf` and `snapshot::eof` can also be NULL if the
   `packed-refs` file is empty.

2. create_snapshot(): exit early if the file was empty

   Avoid undefined behavior by returning early if `snapshot->buf` is
   NULL.

3. find_reference_location(): don't invoke if `snapshot->buf` is NULL

   Avoid undefined behavior and confusing semantics by not calling
   `find_reference_location()` when `snapshot->buf` is NULL.

Michael

Michael Haggerty (3):
  SQUASH? Mention that `snapshot::buf` can be NULL for empty files
  create_snapshot(): exit early if the file was empty
  find_reference_location(): don't invoke if `snapshot->buf` is NULL

 refs/packed-backend.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

-- 
2.14.2

Re: [PATCH] refs: drop "clear packed-refs while locked" assertion

2018-01-01 Thread Michael Haggerty

On Fri, Dec 8, 2017 at 12:22 PM, Jeff King <p...@peff.net> wrote:
> This patch fixes a regression in v2.14.0. It's actually fixed already in
> v2.15.0 because all of the packed-ref code there was rewritten. So
> there's no point in applying this on "master" or even "maint". But I
> figured it was worth sharing here in case somebody else runs across it,
> and in case we ever do a v2.14.4 release.

I forgot to respond to this. +1

Reviewed-by: Michael Haggerty <mhag...@alum.mit.edu>

Michael

> -- >8 --
> In clear_packed_ref_cache(), we assert that we're not
> currently holding the packed-refs lock. But in each of the
> three code paths that can hit this, the assertion is either
> a noop or actively does the wrong thing:
>
>  1. in rollback_packed_refs(), we will have just released
> the lock before calling the function, and so the
> assertion can never trigger.
>
>  2. get_packed_ref_cache() can reach this assertion via
> validate_packed_ref_cache(). But it calls the validate
> function only when it knows that we're not holding the
> lock, so again, the assertion can never trigger.
>
>  3. lock_packed_refs() also calls validate_packed_ref_cache().
> In this case we're _always_ holding the lock, which
> means any time the validate function has to clear the
> cache, we'll trigger this assertion and die.
>
> This doesn't happen often in practice because the
> validate function clears the cache only if we find that
> somebody else has racily rewritten the packed-refs file
> between the time we read it and the time we took the lock.
>
> So most of the time we don't reach the assertion at all
> (nobody has racily written the file so there's no need
> to clear the cache). And when we do, it is not actually
> indicative of a bug; clearing the cache while holding
> the lock is the right thing to do here.
>
> This final case is relatively new, being triggerd by the
> extra validation added in fed6ebebf1 (lock_packed_refs():
> fix cache validity check, 2017-06-12).
>
> Signed-off-by: Jeff King <p...@peff.net>
> ---
>  refs/files-backend.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index f21a954ce7..dd41e1d382 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -99,8 +99,6 @@ static void clear_packed_ref_cache(struct files_ref_store 
> *refs)
> if (refs->packed) {
> struct packed_ref_cache *packed_refs = refs->packed;
>
> -   if (is_lock_file_locked(>packed_refs_lock))
> -   die("BUG: packed-ref cache cleared while locked");
> refs->packed = NULL;
> release_packed_ref_cache(packed_refs);
> }
> --
> 2.15.1.659.g8bd2eae3ea

Frau Ruth Michael

2017-12-15 Thread Mrs Ruth Michael

Mein Liebster im Herrn,

Grüße an Sie im Namen unseres Herrn Jesus Christus. Es war nach dem Durchlaufen 
Ihres Profils. Ich bin bewegt, Sie mit diesem wichtigen Thema in Kontakt zu 
bringen, mit dem Glauben und der Zuversicht, dass Sie ein guter Assistent 
dieses Werkes Gottes sein werden. Zunächst möchte ich mich Ihnen vorstellen. 
Ich bin Mrs. Ruth Michael aus Kuwait. Ich war mit dir verheiratet, Peter 
Michael. Wer hat 15 Jahre lang mit der kuwaitischen Botschaft in Côte d'Ivoire 
gearbeitet, bevor er 2010 starb?

Wir waren 13 Jahre ohne Kind verheiratet. Er starb nach einer kurzen Krankheit, 
die nur vier Tage dauerte. Bevor wir starben, waren wir beide stark gläubig. 
Seit seinem Tod habe ich beschlossen, nicht wieder zu heiraten oder ein Kind 
aus meiner Heirat zu bekommen, gegen das die Bibel ist. Als mein verstorbener 
Ehemann noch lebte, hinterlegte er den Betrag von 10,5 Millionen Dollar in 
einer Bank hier in Abidjan, dem ökonomischen Kapital der Elfenbeinküste für das 
Sorgerecht. Derzeit hat mein Arzt bestätigt, dass ich eine schwere Krankheit 
habe, die Krebserkrankung ist.

Was mich am meisten stört, ist mein Schlaganfall. Haven hat meine Bedingung 
erfüllt. Ich habe beschlossen, diesen Fonds an eine Gemeinde oder Einzelperson 
zu spenden, die dieses Geld in der Art und Weise verwenden wird, wie ich es 
hier unterrichten werde. Ich möchte eine Kirche, die diesen Fonds für 
Waisenhäuser nutzt, den Witwen hilft, das Wort Gottes verbreitet und sich 
bemüht, das Haus Gottes aufrechtzuerhalten. Die Bibel hat uns verständlich 
gemacht, dass die Hand, die gibt, gesegnet ist, ich habe diese Entscheidung 
getroffen, weil ich keinen Sohn habe, der dieses Geld erben wird, und die 
Verwandten meines Mannes sind keine Christen und ich möchte nicht, dass die 
Bemühungen meines Mannes von Ungläubigen benutzt werden.

Ich will keine Situation, in der dieses Geld gottlos eingesetzt wird. Darum 
treffe ich diese Entscheidung. Ich habe keine Angst vor dem Tod, also weiß ich, 
wohin ich gehe. Ich weiß, dass ich im Herzen des Herrn sein werde. Exodus 14 VS 
14 besagt, dass Sie meinen Fall bekämpfen werden und ich werde meinen Frieden 
behalten. Sobald ich Ihre Antwort erhalten habe, werde ich Ihnen hier in 
Abidjan den Kontakt zur Sicherheitsfirma geben. Ich möchte, dass du und die 
Kirche immer für mich beten, weil du mein Pastor bist. Mein Glück ist, dass ich 
ein Leben eines würdigen Christen gelebt habe. Wer dem Herrn dienen will, muss 
ihm im Geist und in der Wahrheit dienen.

Sei bitte immer dein Leben lang fromm. Jede Verzögerung in Ihrer Antwort wird 
mir Raum geben, eine andere Kirche oder Einzelperson für diesen gleichen Zweck 
zu beschaffen. Bitte versichere mir, dass du nach dem handeln wirst, was ich 
hier gesagt habe.

Warten auf Ihre Antwort.

Bleibe gesegnet im Herrn.

Dein in Christus,
Frau Ruth Michael

Re: [PATCH v3] Makefile: replace perl/Makefile.PL with simple make rules

2017-12-15 Thread Michael J Gruber

Ævar Arnfjörð Bjarmason venit, vidit, dixit 12.12.2017 23:26:
> 
> On Tue, Dec 12 2017, Randall S. Becker jotted:
> 
>> -Original Message-
>> On December 10, 2017 4:14 PM, Ævar Arnfjörð Bjarmason wrote:
>> Subject: [PATCH v3] Makefile: replace perl/Makefile.PL with simple make rules
>>
>>> Replace the perl/Makefile.PL and the fallback perl/Makefile used under 
>>> NO_PERL_MAKEMAKER=NoThanks with a much simpler implementation heavily 
>>> inspired by how the i18n infrastructure's build process works[1].
>>> The reason for having the Makefile.PL in the first place is that it was 
>>> initially[2] building a perl C binding to interface with libgit, this 
>>> functionality, that was removed[3] before Git.pm ever made it to the master 
>>> branch.
>> 
>>
>> I would like to request that the we be careful that the git builds do not 
>> introduce arbitrary dependencies to CPAN. Some platforms (I can think of one 
>> off the top, being NonStop) does not provide for arbitrary additions to the 
>> supplied perl implementation as of yet. The assumption about being able to 
>> add CPAN modules may apply on some platforms but is not a general 
>> capability. I am humbly requesting that caution be used when adding 
>> dependencies. Being non-$DAYJOB responsible for the git port for NonStop, 
>> this scares me a bit, but I and my group can help validate the available 
>> modules used for builds.
>>
>> Note: we do not yet have CPAN's SCM so can't and don't use perl for access 
>> to git anyway - much that I've tried to change that.
>>
>> Please keep build dependencies to a minimum.
>>
>> Thanks for my and my whole team.
> 
> I think you should be happy with this patch then, and it doesn't add any
> more CPAN dependency than before, and sets up a framework (as discussed
> in [1]) where we can use more CPAN modules while not requiring packagers
> such as yourself to package CPAN modules.
> 
> However, it doesn't sound believable to me that even on NonStop you
> can't install any CPAN modules whatsoever.
> 
> That would also mean that this patch doesn't work for you, because it
> means that you either don't have anything resembling a hierarchical
> filesystem on which git can be installed in the first place (in which
> case it wouldn't work), or perl doesn't have an @INC to search through
> perl libs on on NonStop. What does:
> 
> perl -V
> 
> Return for you on that system?
> 
> If this patch works, and if at the bottom of `perl -V` you see some
> directories which you could write a package to drop some static *.pm
> files, then you can grab a *.tar.gz from CPAN such as the one for
> Error.pm[2] and arrange for the *.pm files contained within its lib/
> directory to be dropped into one of those @INC directories.
> 
> It may be that some aspect of the CPAN toolchain is broken for you, or
> even ExtUtils::MakeMaker, but you typically don't need that to package
> non-XS perl modules, certainly not any of the ones we've discussed
> possibly bundling up in git.git on-list recently. As a (very occasional)
> contributor to perl.git I'd be interested to know if that's what you
> mean is broken, and if so see if it could be fixed for you.
> 
> 1.

Re: Problem with environment of hook execution when git is run with --work-tree / --git-dir

2017-11-26 Thread Michael Sloan

On Sun, Nov 26, 2017 at 5:16 PM, Junio C Hamano <gits...@pobox.com> wrote:
> Michael Sloan <mgsl...@gmail.com> writes:
>
>> So what's the problem with this choice of environment variables?
>> Well, the problem is that if PWD is changed during the execution of
>> the script, then GIT_WORK_TREE and GIT_DIR will no longer work
>> properly. For example, if the pre-commit hook is
>>
>> #!/bin/sh
>> cd some-dir
>> git status
>>
>> This will fail with
>>
>> Not a git repository: '.dotfiles-git'
>
> That is to be expected.  It's up to the script to make them absolute
> if it cannot cope with relative paths.
>
>> There is another detail here, which is that when --git-dir /
>> --work-tree is not specified, the no GIT_WORK_TREE / GIT_DIR
>> environment variable is set.  This means that in this case, changing
>> PWD in the hook will work fine as long as the search for .git will
>> find the right one.
>
> That also is working as designed.

Hmm, I do not think that this is good for the reliability of hooks.
It means that hooks written with the common case assumption (no
GIT_WORK_TREE / GIT_DIR set) will fail when run in these rarer
configurations.  I imagine that many authors of hooks are entirely
unaware of work-tree / git-dir options.  I know that I used git for
something like 8 years before encountering a use for them, or really
being aware.  Perhaps hooks authors are savvier than your average
user.

It seems to me like something similar to my suggested GIT_DIR_MAPPINGS
could be quite powerful for this circumstance as well as others.  I
guess a temporary hack would be to create a ".git" file that specifies
the git-dir, but this doesn't work if something is already called
".git".

I do not have a concrete example of this causing problems in practice,
I've just observed that it is a potential gotcha for --git-dir +
hooks.  I can understand keeping things simple even if it means making
it much harder for hooks authors to write correct code.  It seems
bothersome to me that git hooks have to be crafted so carefully to
support variations in environment.

If we could go back to when hooks were introduced and add a
"GIT_IN_HOOK=1" and have it require manual specification of
--work-tree and -git-dir, that might have been the best option.
However, doing that now would break everyone's hooks, so not really
practical.

Also, thanks for all your work on git, overall it is excellent software :)

-Michael

Problem with environment of hook execution when git is run with --work-tree / --git-dir

2017-11-25 Thread Michael Sloan

Hi!

I noticed a potential bug with the invocation of a pre-commit hook
when running git with --work-tree and --git-dir.  In particular, I was
investigating how hooks can still run git commands properly even when
the work-tree or git-dir is overridden via CLI args.  I put the
following in "/home/mgsloan/.dotfiles-git/hooks/pre-commit":

#!/bin/sh
env

after this, running "git --work-tree=/home/mgsloan
--git-dir=/home/mgsloan/.dotfiles-git commit" has output with a bunch
of variables, here are the important ones:

GIT_WORK_TREE=.
GIT_DIR=.dotfiles-git/
PWD=/home/mgsloan

So what's the problem with this choice of environment variables?
Well, the problem is that if PWD is changed during the execution of
the script, then GIT_WORK_TREE and GIT_DIR will no longer work
properly. For example, if the pre-commit hook is

#!/bin/sh
cd some-dir
git status

This will fail with

Not a git repository: '.dotfiles-git'

There is another detail here, which is that when --git-dir /
--work-tree is not specified, the no GIT_WORK_TREE / GIT_DIR
environment variable is set.  This means that in this case, changing
PWD in the hook will work fine as long as the search for .git will
find the right one.  Note that this also means that changing PWD in a
script can change which git repo the command is being run on, for
example, when the hook is interacting with a submodule.

A half-fix to this would be to have the GIT_WORK_TREE and GIT_DIR set
when running hooks use absolute paths.  However, this would not have
the same behavior as when git is used without --git-dir / --work-tree.
As described in the paragraph above, if PWD is relied upon to instead
target a different git repo, then things break.

Not sure what the total fix for this would be.   I think the
information that needs to be conveyed to the hook's git invocations is
that "the work-tree /home/mgsloan should be associated with the
git-dir /home/mgsloan/.dotfiles-git".  Could have an env var like

GIT_DIR_MAPPINGS="/home/mgsloan!/home/mgsloan/.dotfiles-git"

The idea is that this would be a list of mappings from GIT_WORK_TREE
to GIT_DIR.  If this variable is set, then it will be followed when
git is searching parents of PWD for ".git" directories.  I chose "!"
rather arbitrarily here.  "->" would look nicer, but people might
forget to escape it when programmatically setting this var.

What do y'all think of this idea?

Some of you might be wondering what I'm doing with my work tree being
my home directory.  It is the approach suggested here -
https://developer.atlassian.com/blog/2016/02/best-way-to-store-dotfiles-git-bare-repo/
- for versioning your configuration files directly.

Apologies if this has already been discussed, I could not find a good
way to search the mailinglist archives.

Thanks!
-Michael

Re: [PATCH 2/3] merge-base: return fork-point outside reflog

2017-11-08 Thread Michael J Gruber

Ekelhart Jakob venit, vidit, dixit 08.11.2017 09:52:
> Thank you for all the effort to fix this issue. Unfortunately, we are still 
> suffering from this and our workaround just stopped being sufficient.
> 
> We were wondering if there is any way to tell when this fix will be released?
> 
> BR Jakob

Soon (TM) :)

Term start kept me busy, but I'll try and resume dangling topics this
week or next.

It seems the consensus was that current functionality is as designed but
not necessarily as expected, and another mode "--fork-base" (that does
what I suggested as "fix") would meet these expectations. I would reuse
the documentation of the current mode as a description of the new mode
and add documentation for the existing mode ;)

Michael

Re: [PATCH v1 2/2] log: add option to choose which refs to decorate

On 11/05/2017 07:17 AM, Junio C Hamano wrote:
> Junio C Hamano <gits...@pobox.com> writes:
>> Rafael Ascensão <rafa.al...@gmail.com> writes:
>> ...
>>> Because changing the default behavior of that function has
>>> implications on multiple commands which I think shouldn't change. But
>>> at the same time, would be nice to have the logic that deals with
>>> glob-ref patterns all in one place.
>>>
>>> What's the sane way to do this?
>>
>> Learn to type "--decorate-refs="refs/heads/[m]aster", and not twewak
>> the code at all, perhaps.  The users of existing "with no globbing,
>> /* is appended" interface are already used to that way and they do
>> not have to learn a new and inconsistent interface.
>>
>> After all, "I only want to see 'git log' output with 'master'
>> decorated" (i.e. not specifying "this class of refs I can glob by
>> using the naming convention I am using" and instead enumerating the
>> ones you care about) does not sound like a sensible thing people
>> often want to do, so making it follow the other codepath so that
>> people can say "refs/tags" to get "refs/tags/*", while still allowing
>> such a rare but specific and exact one possible, may not sound too
>> bad to me.
> 
> Having said all that, I can imagine another way out might be to
> change the behaviour of this "normalize" thing to add two patterns,
> the original pattern in addition to the original pattern plus "/*",
> when it sees a pattern without any glob.  Many users who relied on
> the current behaviour fed "refs/tags" knowing that it will match
> everything under "refs/tags" i.e. "refs/tags/*", and they cannot
> have a ref that is exactly "refs/tags", so adding the original
> pattern without an extra trailing "/*" would not hurt them.  And
> this will allow you to say "refs/heads/master" when you know you
> want that exact ref, and in such a repository where that original
> pattern without trailing "/*" would be useful, because you cannot
> have "refs/heads/master/one" at the same time, having an extra
> pattern that is the original plus "/*" would not hurt you, either.
> 
> This however needs a bit of thought to see if there are corner cases
> that may result in unexpected and unwanted fallout, and something I
> am reluctant to declare unilaterally that it is a better way to go.

There's some glob-matching code (somewhere? I don't know if it's allowed
everywhere) that allows "**" to mean "zero or one path components. If
"refs/tags" were massaged to be "refs/tags/**", then it would match not only

refs/tags
refs/tags/foo

but also

refs/tags/foo/bar

, which is probably another thing that the user would expect to see.

There's at least some precedent for this kind of expansion: `git
for-each-ref refs/remotes` lists *all* references under that prefix,
even if they have multiple levels.

Michael

Re: [PATCH v1 1/2] refs: extract function to normalize partial refs

On 11/06/2017 02:23 AM, Junio C Hamano wrote:
> Michael Haggerty <mhag...@alum.mit.edu> writes:
> 
>> [1] I say "almost entirely" because putting them in one function means
>> that only `pattern` needs to be scanned for glob characters. But that is
>> an unimportant detail.
> 
> That could actually be an important detail, in that even if prefix
> has wildcard, we'd still append the trailing "/*" as long as the
> pattern does not, right?

That's correct, but I was assuming that the prefix would always be a
hard-coded string like "refs/tags/" or maybe "refs/". (That is the case
now.) It doesn't seem very useful to use a prefix like "refs/*/".

Michael

Re: [PATCH v1 1/2] refs: extract function to normalize partial refs

.

So the interface might be simplified by having two functions,

void normalize_glob_ref(normalized_pattern, prefix, pattern);
void ensure_blob(struct strbuf *pattern);

The caller in this patch would call the functions one after the other
(or the `ensure_blob` behavior could be inlined in
`for_each_glob_ref_in()`, since it doesn't yet have any callers). And
the callers introduced in patch 2 would only need to call the first
function.

>  static inline const char *has_glob_specials(const char *pattern)
>  {
>   return strpbrk(pattern, "?*[");
> 

Michael

[1] I say "almost entirely" because putting them in one function means
that only `pattern` needs to be scanned for glob characters. But that is
an unimportant detail.

Re: [PATCH v1 1/2] refs: extract function to normalize partial refs

On 11/04/2017 11:45 PM, Kevin Daudt wrote:
> On Sat, Nov 04, 2017 at 11:27:39AM +0900, Junio C Hamano wrote:
>> I however notice that addition of /* to the tail is trying to be
>> careful by using strbuf_complete('/'), but prefixing with "refs/"
>> does not and we would end up with a double-slash if pattern begins
>> with a slash.  The contract between the caller of this function (or
>> its original, which is for_each_glob_ref_in()) and the callee is
>> that prefix must not begin with '/', so it may be OK, but we might
>> want to add "if (*pattern == '/') BUG(...)" at the beginning.
>>
>> I dunno.  In any case, that is totally outside the scope of this two
>> patch series.
> 
> I do think it's a good idea to make future readers of the code aware of
> this contract, and adding a BUG assert does that quite well. Here is a
> patch that implements it.
> 
> This applies of course on top of this patch series.
> 
> -- >8 --
> Subject: [PATCH] normalize_glob_ref: assert implicit contract of prefix
> 
> normalize_glob_ref has an implicit contract of expecting 'prefix' to not
> start with a '/', otherwise the pattern would end up with a
> double-slash.
> 
> Mark it as a BUG when the prefix argument of normalize_glob_ref starts
> with a '/' so that future callers will be aware of this contract.
> 
> Signed-off-by: Kevin Daudt <m...@ikke.info>
> ---
>  refs.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/refs.c b/refs.c
> index e9ae659ae..6747981d1 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -372,6 +372,8 @@ int head_ref_namespaced(each_ref_fn fn, void *cb_data)
>  void normalize_glob_ref(struct strbuf *normalized_pattern, const char 
> *prefix,
>   const char *pattern, int flags)
>  {
> + if (prefix && *prefix == '/') BUG("prefix cannot not start with '/'");

This should be split onto two lines.

Also, "prefix cannot not start ..." has two "not". I suggest changing it
to "prefix must not start ...", because that makes it clearer that the
caller is at fault.

What if the caller passes the empty string as prefix? In that case, the
end result would be "/", which is also bogus.

> +
>   if (!prefix && !starts_with(pattern, "refs/"))
>   strbuf_addstr(normalized_pattern, "refs/");
>   else if (prefix)

Michael

[PATCH v2 2/9] prune_ref(): call `ref_transaction_add_update()` directly

`prune_ref()` needs to use the `REF_ISPRUNING` flag, but we want to
make that flag private to the files backend. So instead of calling
`ref_transaction_delete()`, which is a public function and therefore
shouldn't allow the `REF_ISPRUNING` flag, change `prune_ref()` to call
`ref_transaction_add_update()`, which is private to the refs
module. (Note that we don't need any of the other services provided by
`ref_transaction_delete()`.)

This allows us to change `ref_transaction_update()` to reject the
`REF_ISPRUNING` flag. Do so by adjusting
`REF_TRANSACTION_UPDATE_ALLOWED_FLAGS`. Also add parentheses to its
definition to avoid potential future mishaps.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.h   |  4 +---
 refs/files-backend.c | 25 -
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/refs.h b/refs.h
index 15fd419c7d..4ffef9502d 100644
--- a/refs.h
+++ b/refs.h
@@ -349,9 +349,7 @@ int refs_pack_refs(struct ref_store *refs, unsigned int 
flags);
  * Flags that can be passed in to ref_transaction_update
  */
 #define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS \
-   REF_ISPRUNING |  \
-   REF_FORCE_CREATE_REFLOG |\
-   REF_NODEREF
+   (REF_NODEREF | REF_FORCE_CREATE_REFLOG)
 
 /*
  * Setup reflog before using. Fill in err and return -1 on failure.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index fadf1036d3..ba72d28b13 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -989,22 +989,29 @@ static void prune_ref(struct files_ref_store *refs, 
struct ref_to_prune *r)
 {
struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
+   int ret = -1;
 
if (check_refname_format(r->name, 0))
return;
 
transaction = ref_store_transaction_begin(>base, );
-   if (!transaction ||
-   ref_transaction_delete(transaction, r->name, >oid,
-  REF_ISPRUNING | REF_NODEREF, NULL, ) ||
-   ref_transaction_commit(transaction, )) {
-   ref_transaction_free(transaction);
+   if (!transaction)
+   goto cleanup;
+   ref_transaction_add_update(
+   transaction, r->name,
+   REF_NODEREF | REF_HAVE_NEW | REF_HAVE_OLD | 
REF_ISPRUNING,
+   _oid, >oid, NULL);
+   if (ref_transaction_commit(transaction, ))
+   goto cleanup;
+
+   ret = 0;
+
+cleanup:
+   if (ret)
error("%s", err.buf);
-   strbuf_release();
-   return;
-   }
-   ref_transaction_free(transaction);
strbuf_release();
+   ref_transaction_free(transaction);
+   return;
 }
 
 /*
-- 
2.14.1

[PATCH v2 4/9] ref_transaction_add_update(): remove a check

We want to make `REF_ISPRUNING` internal to the files backend. For
this to be possible, `ref_transaction_add_update()` mustn't know about
it. So move the check that `REF_ISPRUNING` is only used with
`REF_NODEREF` from this function to `files_transaction_prepare()`.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.c   | 3 ---
 refs/files-backend.c | 7 ++-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/refs.c b/refs.c
index 7c1e206e08..0d9a1348cd 100644
--- a/refs.c
+++ b/refs.c
@@ -906,9 +906,6 @@ struct ref_update *ref_transaction_add_update(
if (transaction->state != REF_TRANSACTION_OPEN)
die("BUG: update called for transaction that is not open");
 
-   if ((flags & REF_ISPRUNING) && !(flags & REF_NODEREF))
-   die("BUG: REF_ISPRUNING set without REF_NODEREF");
-
FLEX_ALLOC_STR(update, refname, refname);
ALLOC_GROW(transaction->updates, transaction->nr + 1, 
transaction->alloc);
transaction->updates[transaction->nr++] = update;
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ba72d28b13..a47771e4d4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2518,13 +2518,18 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 * transaction. (If we end up splitting up any updates using
 * split_symref_update() or split_head_update(), those
 * functions will check that the new updates don't have the
-* same refname as any existing ones.)
+* same refname as any existing ones.) Also fail if any of the
+* updates use REF_ISPRUNING without REF_NODEREF.
 */
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
struct string_list_item *item =
string_list_append(_refnames, update->refname);
 
+   if ((update->flags & REF_ISPRUNING) &&
+   !(update->flags & REF_NODEREF))
+   BUG("REF_ISPRUNING set without REF_NODEREF");
+
/*
 * We store a pointer to update in item->util, but at
 * the moment we never use the value of this field
-- 
2.14.1

[PATCH v2 6/9] refs: rename constant `REF_NODEREF` to `REF_NO_DEREF`

Even after working with this code for years, I still see this constant
name as "ref node ref". Rename it to make it's meaning clearer.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 builtin/am.c   |  2 +-
 builtin/branch.c   |  2 +-
 builtin/checkout.c |  2 +-
 builtin/clone.c|  4 ++--
 builtin/notes.c|  2 +-
 builtin/remote.c   |  6 +++---
 builtin/symbolic-ref.c |  2 +-
 builtin/update-ref.c   |  4 ++--
 refs.h |  6 +++---
 refs/files-backend.c   | 40 
 refs/refs-internal.h   |  4 ++--
 sequencer.c|  6 +++---
 12 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index c9bb14a6c2..894290e2d3 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -2151,7 +2151,7 @@ static void am_abort(struct am_state *state)
   has_curr_head ? _head : NULL, 0,
   UPDATE_REFS_DIE_ON_ERR);
else if (curr_branch)
-   delete_ref(NULL, curr_branch, NULL, REF_NODEREF);
+   delete_ref(NULL, curr_branch, NULL, REF_NO_DEREF);
 
free(curr_branch);
am_destroy(state);
diff --git a/builtin/branch.c b/builtin/branch.c
index b1ed649300..33fd5fcfd1 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -258,7 +258,7 @@ static int delete_branches(int argc, const char **argv, int 
force, int kinds,
}
 
if (delete_ref(NULL, name, is_null_oid() ? NULL : ,
-  REF_NODEREF)) {
+  REF_NO_DEREF)) {
error(remote_branch
  ? _("Error deleting remote-tracking branch '%s'")
  : _("Error deleting branch '%s'"),
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 463a337e5d..114028ee01 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -665,7 +665,7 @@ static void update_refs_for_switch(const struct 
checkout_opts *opts,
/* Nothing to do. */
} else if (opts->force_detach || !new->path) {  /* No longer on any 
branch. */
update_ref(msg.buf, "HEAD", >commit->object.oid, NULL,
-  REF_NODEREF, UPDATE_REFS_DIE_ON_ERR);
+  REF_NO_DEREF, UPDATE_REFS_DIE_ON_ERR);
if (!opts->quiet) {
if (old->path &&
advice_detached_head && !opts->force_detach)
diff --git a/builtin/clone.c b/builtin/clone.c
index 695bdd7046..557c6c3c06 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -689,7 +689,7 @@ static void update_head(const struct ref *our, const struct 
ref *remote,
} else if (our) {
struct commit *c = lookup_commit_reference(>old_oid);
/* --branch specifies a non-branch (i.e. tags), detach HEAD */
-   update_ref(msg, "HEAD", >object.oid, NULL, REF_NODEREF,
+   update_ref(msg, "HEAD", >object.oid, NULL, REF_NO_DEREF,
   UPDATE_REFS_DIE_ON_ERR);
} else if (remote) {
/*
@@ -697,7 +697,7 @@ static void update_head(const struct ref *our, const struct 
ref *remote,
 * HEAD points to a branch but we don't know which one.
 * Detach HEAD in all these cases.
 */
-   update_ref(msg, "HEAD", >old_oid, NULL, REF_NODEREF,
+   update_ref(msg, "HEAD", >old_oid, NULL, REF_NO_DEREF,
   UPDATE_REFS_DIE_ON_ERR);
}
 }
diff --git a/builtin/notes.c b/builtin/notes.c
index 12afdf1907..d7754db143 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -686,7 +686,7 @@ static int merge_abort(struct notes_merge_options *o)
 
if (delete_ref(NULL, "NOTES_MERGE_PARTIAL", NULL, 0))
ret += error(_("failed to delete ref NOTES_MERGE_PARTIAL"));
-   if (delete_ref(NULL, "NOTES_MERGE_REF", NULL, REF_NODEREF))
+   if (delete_ref(NULL, "NOTES_MERGE_REF", NULL, REF_NO_DEREF))
ret += error(_("failed to delete ref NOTES_MERGE_REF"));
if (notes_merge_abort(o))
ret += error(_("failed to remove 'git notes merge' worktree"));
diff --git a/builtin/remote.c b/builtin/remote.c
index 0fddc64461..3d38c6150c 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -693,7 +693,7 @@ static int mv(int argc, const char **argv)
read_ref_full(item->string, RESOLVE_REF_READING, , );
if (!(flag & REF_ISSYMREF))
continue;
-   if (delete_ref(NULL, item->string, NULL, REF_NODEREF))
+   if (delete_ref(NULL, item->string, NULL, REF_NO_DEREF))
die(_(

[PATCH v2 3/9] ref_transaction_update(): die on disallowed flags

Callers shouldn't be passing disallowed flags into
`ref_transaction_update()`. So instead of masking them off, treat it
as a bug if any are set.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 62a7621025..7c1e206e08 100644
--- a/refs.c
+++ b/refs.c
@@ -940,7 +940,8 @@ int ref_transaction_update(struct ref_transaction 
*transaction,
return -1;
}
 
-   flags &= REF_TRANSACTION_UPDATE_ALLOWED_FLAGS;
+   if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
+   BUG("illegal flags 0x%x passed to ref_transaction_update()", 
flags);
 
flags |= (new_oid ? REF_HAVE_NEW : 0) | (old_oid ? REF_HAVE_OLD : 0);
 
-- 
2.14.1

[PATCH v2 9/9] refs: update some more docs to use "oid" rather than "sha1"

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.c|  2 +-
 refs.h|  8 
 refs/files-backend.c  | 19 +--
 refs/packed-backend.c |  2 +-
 refs/ref-cache.c  |  4 ++--
 refs/refs-internal.h  | 18 +-
 6 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/refs.c b/refs.c
index 0d9a1348cd..339d4318ee 100644
--- a/refs.c
+++ b/refs.c
@@ -770,7 +770,7 @@ static int read_ref_at_ent(struct object_id *ooid, struct 
object_id *noid,
if (cb->cutoff_cnt)
*cb->cutoff_cnt = cb->reccnt - 1;
/*
-* we have not yet updated cb->[n|o]sha1 so they still
+* we have not yet updated cb->[n|o]oid so they still
 * hold the values for the previous record.
 */
if (!is_null_oid(>ooid)) {
diff --git a/refs.h b/refs.h
index d396012367..18582a408c 100644
--- a/refs.h
+++ b/refs.h
@@ -126,7 +126,7 @@ int peel_ref(const char *refname, struct object_id *oid);
 /**
  * Resolve refname in the nested "gitlink" repository in the specified
  * submodule (which must be non-NULL). If the resolution is
- * successful, return 0 and set sha1 to the name of the object;
+ * successful, return 0 and set oid to the name of the object;
  * otherwise, return a non-zero value.
  */
 int resolve_gitlink_ref(const char *submodule, const char *refname,
@@ -260,7 +260,7 @@ struct ref_transaction;
 
 /*
  * The signature for the callback function for the for_each_*()
- * functions below.  The memory pointed to by the refname and sha1
+ * functions below.  The memory pointed to by the refname and oid
  * arguments is only guaranteed to be valid for the duration of a
  * single callback invocation.
  */
@@ -354,7 +354,7 @@ int reflog_exists(const char *refname);
 
 /*
  * Delete the specified reference. If old_oid is non-NULL, then
- * verify that the current value of the reference is old_sha1 before
+ * verify that the current value of the reference is old_oid before
  * deleting it. If old_oid is NULL, delete the reference if it
  * exists, regardless of its old value. It is an error for old_oid to
  * be null_oid. msg and flags are passed through to
@@ -633,7 +633,7 @@ int ref_transaction_abort(struct ref_transaction 
*transaction,
  * It is a bug to call this function when there might be other
  * processes accessing the repository or if there are existing
  * references that might conflict with the ones being created. All
- * old_sha1 values must either be absent or NULL_SHA1.
+ * old_oid values must either be absent or null_oid.
  */
 int initial_ref_transaction_commit(struct ref_transaction *transaction,
   struct strbuf *err);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index bb10b715a8..2298f900dd 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -240,7 +240,7 @@ static void loose_fill_ref_dir(struct ref_store *ref_store,
} else if (is_null_oid()) {
/*
 * It is so astronomically unlikely
-* that NULL_SHA1 is the SHA-1 of an
+* that null_oid is the OID of an
 * actual object that we consider its
 * appearance in a loose reference
 * file to be repo corruption
@@ -473,7 +473,7 @@ static void unlock_ref(struct ref_lock *lock)
  * are passed to refs_verify_refname_available() for this check.
  *
  * If mustexist is not set and the reference is not found or is
- * broken, lock the reference anyway but clear sha1.
+ * broken, lock the reference anyway but clear old_oid.
  *
  * Return 0 on success. On failure, write an error message to err and
  * return TRANSACTION_NAME_CONFLICT or TRANSACTION_GENERIC_ERROR.
@@ -1648,9 +1648,8 @@ static int files_log_ref_write(struct files_ref_store 
*refs,
 }
 
 /*
- * Write sha1 into the open lockfile, then close the lockfile. On
- * errors, rollback the lockfile, fill in *err and
- * return -1.
+ * Write oid into the open lockfile, then close the lockfile. On
+ * errors, rollback the lockfile, fill in *err and return -1.
  */
 static int write_ref_to_lockfile(struct ref_lock *lock,
 const struct object_id *oid, struct strbuf 
*err)
@@ -2272,7 +2271,7 @@ static int split_symref_update(struct files_ref_store 
*refs,
 
/*
 * Change the symbolic ref update to log only. Also, it
-* doesn't need to check its old SHA-1 value, as that will be
+* doesn't need to check its old OID value, as that will be
 * done when new_update is processed.
 */
update->flags |= REF_LOG_ONLY | REF_NO_DEREF;
@@ -2341,7 +2340,7 @@ static int check_old_oid(struct ref_update *update, 
struct object_

[PATCH v2 5/9] refs: tidy up and adjust visibility of the `ref_update` flags

The constants used for `ref_update::flags` were rather disorganized:

* The definitions in `refs.h` were not close to the functions that
  used them.

* Maybe constants were defined in `refs-internal.h`, making them
  visible to the whole refs module, when in fact they only made sense
  for the files backend.

* Their documentation wasn't very consistent and partly still referred
  to sha1s rather than oids.

* The numerical values followed no rational scheme

Fix all of these problems. The main functional improvement is that
some constants' visibility is now limited to `files-backend.c`.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs.h   | 67 ++--
 refs/files-backend.c | 45 +++
 refs/refs-internal.h | 67 
 3 files changed, 99 insertions(+), 80 deletions(-)

diff --git a/refs.h b/refs.h
index 4ffef9502d..261d46c10c 100644
--- a/refs.h
+++ b/refs.h
@@ -335,22 +335,6 @@ void warn_dangling_symrefs(FILE *fp, const char *msg_fmt,
  */
 int refs_pack_refs(struct ref_store *refs, unsigned int flags);
 
-/*
- * Flags controlling ref_transaction_update(), ref_transaction_create(), etc.
- * REF_NODEREF: act on the ref directly, instead of dereferencing
- *  symbolic references.
- *
- * Other flags are reserved for internal use.
- */
-#define REF_NODEREF0x01
-#define REF_FORCE_CREATE_REFLOG 0x40
-
-/*
- * Flags that can be passed in to ref_transaction_update
- */
-#define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS \
-   (REF_NODEREF | REF_FORCE_CREATE_REFLOG)
-
 /*
  * Setup reflog before using. Fill in err and return -1 on failure.
  */
@@ -478,22 +462,23 @@ struct ref_transaction *ref_transaction_begin(struct 
strbuf *err);
  *
  * refname -- the name of the reference to be affected.
  *
- * new_sha1 -- the SHA-1 that should be set to be the new value of
- * the reference. Some functions allow this parameter to be
+ * new_oid -- the object ID that should be set to be the new value
+ * of the reference. Some functions allow this parameter to be
  * NULL, meaning that the reference is not changed, or
- * null_sha1, meaning that the reference should be deleted. A
+ * null_oid, meaning that the reference should be deleted. A
  * copy of this value is made in the transaction.
  *
- * old_sha1 -- the SHA-1 value that the reference must have before
+ * old_oid -- the object ID that the reference must have before
  * the update. Some functions allow this parameter to be NULL,
  * meaning that the old value of the reference is not checked,
- * or null_sha1, meaning that the reference must not exist
+ * or null_oid, meaning that the reference must not exist
  * before the update. A copy of this value is made in the
  * transaction.
  *
  * flags -- flags affecting the update, passed to
- * update_ref_lock(). Can be REF_NODEREF, which means that
- * symbolic references should not be followed.
+ * update_ref_lock(). Possible flags: REF_NODEREF,
+ * REF_FORCE_CREATE_REFLOG. See those constants for more
+ * information.
  *
  * msg -- a message describing the change (for the reflog).
  *
@@ -509,11 +494,37 @@ struct ref_transaction *ref_transaction_begin(struct 
strbuf *err);
  */
 
 /*
- * Add a reference update to transaction. new_oid is the value that
- * the reference should have after the update, or null_oid if it
- * should be deleted. If new_oid is NULL, then the reference is not
- * changed at all. old_oid is the value that the reference must have
- * before the update, or null_oid if it must not have existed
+ * The following flags can be passed to ref_transaction_update() etc.
+ * Internally, they are stored in `ref_update::flags`, along with some
+ * internal flags.
+ */
+
+/*
+ * Act on the ref directly; i.e., without dereferencing symbolic refs.
+ * If this flag is not specified, then symbolic references are
+ * dereferenced and the update is applied to the referent.
+ */
+#define REF_NODEREF (1 << 0)
+
+/*
+ * Force the creation of a reflog for this reference, even if it
+ * didn't previously have a reflog.
+ */
+#define REF_FORCE_CREATE_REFLOG (1 << 1)
+
+/*
+ * Bitmask of all of the flags that are allowed to be passed in to
+ * ref_transaction_update() and friends:
+ */
+#define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS \
+   (REF_NODEREF | REF_FORCE_CREATE_REFLOG)
+
+/*
+ * Add a reference update to transaction. `new_oid` is the value that
+ * the reference should have after the update, or `null_oid` if it
+ * should be deleted. If `new_oid` is NULL, then the reference is not
+ * changed at all. `old_oid` is the value that the reference must have
+ * before the update, or `null_oid` if it must not have existed
  * beforehand. The old value is checked after

[PATCH v2 0/9] Tidy up the constants related to ref_update::flags

This is a reroll of a patch series that tidies up some stuff around
the ref_update::flags constants. Thanks to Junio and Martin for their
comments about v1 [1].

Relative to v1, this version:

* In patch 5, cleans up the touched comments to refer to OIDs rather
  than SHA-1s.

* Adds a patch 8, which changes `write_packed_entry()` to take
  `object_id` arguments.

* Adds a patch 9, which cleans up some remaining comments across all
  of the refs-related files to refer to OIDs rather than SHA-1s.

This patch series depends on bc/object-id. The patches are also
available from my GitHub fork as branch `tidy-ref-update-flags` [2].

Michael

[1] https://public-inbox.org/git/cover.1509183413.git.mhag...@alum.mit.edu/
[2] https://github.com/mhagger/git

Michael Haggerty (9):
  files_transaction_prepare(): don't leak flags to packed transaction
  prune_ref(): call `ref_transaction_add_update()` directly
  ref_transaction_update(): die on disallowed flags
  ref_transaction_add_update(): remove a check
  refs: tidy up and adjust visibility of the `ref_update` flags
  refs: rename constant `REF_NODEREF` to `REF_NO_DEREF`
  refs: rename constant `REF_ISPRUNING` to `REF_IS_PRUNING`
  write_packed_entry(): take `object_id` arguments
  refs: update some more docs to use "oid" rather than "sha1"

 builtin/am.c   |   2 +-
 builtin/branch.c   |   2 +-
 builtin/checkout.c |   2 +-
 builtin/clone.c|   4 +-
 builtin/notes.c|   2 +-
 builtin/remote.c   |   6 +--
 builtin/symbolic-ref.c |   2 +-
 builtin/update-ref.c   |   4 +-
 refs.c |   8 ++-
 refs.h |  77 -
 refs/files-backend.c   | 132 +++--
 refs/packed-backend.c  |  18 +++
 refs/ref-cache.c   |   4 +-
 refs/refs-internal.h   |  81 +-
 sequencer.c|   6 +--
 15 files changed, 188 insertions(+), 162 deletions(-)

-- 
2.14.1

[PATCH v2 1/9] files_transaction_prepare(): don't leak flags to packed transaction

The files backend uses `ref_update::flags` for several internal flags.
But those flags have no meaning to the packed backend. So when adding
updates for the packed-refs transaction, only use flags that make
sense to the packed backend.

`REF_NODEREF` is part of the public interface, and it's logically what
we want, so include it. In fact it is actually ignored by the packed
backend (which doesn't support symbolic references), but that's its
own business.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2bd54e11ae..fadf1036d3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2594,8 +2594,8 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 
ref_transaction_add_update(
packed_transaction, update->refname,
-   update->flags & ~REF_HAVE_OLD,
-   >new_oid, >old_oid,
+   REF_HAVE_NEW | REF_NODEREF,
+   >new_oid, NULL,
NULL);
}
}
-- 
2.14.1

[PATCH v2 7/9] refs: rename constant `REF_ISPRUNING` to `REF_IS_PRUNING`

Underscores are cheap, and help readability.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/files-backend.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 71e088e811..bb10b715a8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -14,7 +14,7 @@
  * This backend uses the following flags in `ref_update::flags` for
  * internal bookkeeping purposes. Their numerical values must not
  * conflict with REF_NO_DEREF, REF_FORCE_CREATE_REFLOG, REF_HAVE_NEW,
- * REF_HAVE_OLD, or REF_ISPRUNING, which are also stored in
+ * REF_HAVE_OLD, or REF_IS_PRUNING, which are also stored in
  * `ref_update::flags`.
  */
 
@@ -22,7 +22,7 @@
  * Used as a flag in ref_update::flags when a loose ref is being
  * pruned. This flag must only be used when REF_NO_DEREF is set.
  */
-#define REF_ISPRUNING (1 << 4)
+#define REF_IS_PRUNING (1 << 4)
 
 /*
  * Flag passed to lock_ref_sha1_basic() telling it to tolerate broken
@@ -1044,7 +1044,7 @@ static void prune_ref(struct files_ref_store *refs, 
struct ref_to_prune *r)
goto cleanup;
ref_transaction_add_update(
transaction, r->name,
-   REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD | 
REF_ISPRUNING,
+   REF_NO_DEREF | REF_HAVE_NEW | REF_HAVE_OLD | 
REF_IS_PRUNING,
_oid, >oid, NULL);
if (ref_transaction_commit(transaction, ))
goto cleanup;
@@ -2177,7 +2177,7 @@ static int split_head_update(struct ref_update *update,
struct ref_update *new_update;
 
if ((update->flags & REF_LOG_ONLY) ||
-   (update->flags & REF_ISPRUNING) ||
+   (update->flags & REF_IS_PRUNING) ||
(update->flags & REF_UPDATE_VIA_HEAD))
return 0;
 
@@ -2564,16 +2564,16 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 * split_symref_update() or split_head_update(), those
 * functions will check that the new updates don't have the
 * same refname as any existing ones.) Also fail if any of the
-* updates use REF_ISPRUNING without REF_NO_DEREF.
+* updates use REF_IS_PRUNING without REF_NO_DEREF.
 */
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
struct string_list_item *item =
string_list_append(_refnames, update->refname);
 
-   if ((update->flags & REF_ISPRUNING) &&
+   if ((update->flags & REF_IS_PRUNING) &&
!(update->flags & REF_NO_DEREF))
-   BUG("REF_ISPRUNING set without REF_NO_DEREF");
+   BUG("REF_IS_PRUNING set without REF_NO_DEREF");
 
/*
 * We store a pointer to update in item->util, but at
@@ -2632,7 +2632,7 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
 
if (update->flags & REF_DELETING &&
!(update->flags & REF_LOG_ONLY) &&
-   !(update->flags & REF_ISPRUNING)) {
+   !(update->flags & REF_IS_PRUNING)) {
/*
 * This reference has to be deleted from
 * packed-refs if it exists there.
@@ -2749,7 +2749,7 @@ static int files_transaction_finish(struct ref_store 
*ref_store,
struct ref_update *update = transaction->updates[i];
if (update->flags & REF_DELETING &&
!(update->flags & REF_LOG_ONLY) &&
-   !(update->flags & REF_ISPRUNING)) {
+   !(update->flags & REF_IS_PRUNING)) {
strbuf_reset();
files_reflog_path(refs, , update->refname);
if (!unlink_or_warn(sb.buf))
-- 
2.14.1

[PATCH v2 8/9] write_packed_entry(): take `object_id` arguments

Change `write_packed_entry()` to take `struct object_id *` rather than
`unsigned char *` arguments.

Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu>
---
 refs/packed-backend.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 74f1dea0f4..43ad74fc5a 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -961,11 +961,11 @@ static struct ref_iterator *packed_ref_iterator_begin(
  * by the failing call to `fprintf()`.
  */
 static int write_packed_entry(FILE *fh, const char *refname,
- const unsigned char *sha1,
- const unsigned char *peeled)
+ const struct object_id *oid,
+ const struct object_id *peeled)
 {
-   if (fprintf(fh, "%s %s\n", sha1_to_hex(sha1), refname) < 0 ||
-   (peeled && fprintf(fh, "^%s\n", sha1_to_hex(peeled)) < 0))
+   if (fprintf(fh, "%s %s\n", oid_to_hex(oid), refname) < 0 ||
+   (peeled && fprintf(fh, "^%s\n", oid_to_hex(peeled)) < 0))
return -1;
 
return 0;
@@ -1203,8 +1203,8 @@ static int write_with_updates(struct packed_ref_store 
*refs,
int peel_error = ref_iterator_peel(iter, );
 
if (write_packed_entry(out, iter->refname,
-  iter->oid->hash,
-  peel_error ? NULL : peeled.hash))
+  iter->oid,
+  peel_error ? NULL : ))
goto write_error;
 
if ((ok = ref_iterator_advance(iter)) != ITER_OK)
@@ -1224,8 +1224,8 @@ static int write_with_updates(struct packed_ref_store 
*refs,
 );
 
if (write_packed_entry(out, update->refname,
-  update->new_oid.hash,
-  peel_error ? NULL : peeled.hash))
+  >new_oid,
+  peel_error ? NULL : ))
goto write_error;
 
i++;
-- 
2.14.1

Re: [PATCH 5/7] refs: tidy up and adjust visibility of the `ref_update` flags

On 11/01/2017 09:18 AM, Martin Ågren wrote:
> On 28 October 2017 at 11:49, Michael Haggerty <mhag...@alum.mit.edu> wrote:
>> The constants used for `ref_update::flags` were rather disorganized:
> 
>> * The documentation wasn't very consistent and partly still referred
>>   to sha1s rather than oids.
> 
>> @@ -478,22 +462,23 @@ struct ref_transaction *ref_transaction_begin(struct 
>> strbuf *err);
>>   *
>>   * refname -- the name of the reference to be affected.
>>   *
>> - * new_sha1 -- the SHA-1 that should be set to be the new value of
>> + * new_oid -- the SHA-1 that should be set to be the new value of
>>   * the reference. Some functions allow this parameter to be
>>   * NULL, meaning that the reference is not changed, or
>> - * null_sha1, meaning that the reference should be deleted. A
>> + * null_oid, meaning that the reference should be deleted. A
>>   * copy of this value is made in the transaction.
>>   *
>> - * old_sha1 -- the SHA-1 value that the reference must have before
>> + * old_oid -- the SHA-1 value that the reference must have before
> 
> You still refer to "SHA-1" twice in this hunk. Maybe squash this in, at
> least partially? This addresses all remaining "sha"/"SHA" in refs.h.
> [...]

Thanks for this.

I'll squash the changes that have to do with these flags into this
commit, and change the other docstrings as part of a separate commit
that also fixes up similar problems in other refs-related comments.

I also realized that `write_packed_entry()` still takes `unsigned char
*` arguments. I'll fix that, too, in yet another commit.

Thanks,
Michael

Re: [PATCH 1/7] files_transaction_prepare(): don't leak flags to packed transaction