On Tue, Feb 06, 2018 at 02:24:25PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 3) Such hooks slow down pushes, especially on big repos, you can
> optimize things a bit (e.g. only look in the same directories), but
> pathologically you end up needing to compare the cross-product of
> changed files v.s. all existing files for each changed file.
I think you could just complain about any tree that contains entries
that have duplicate entries after normalization. I.e.:
git rev-list --objects $new --not $old |
awk '{print $1}' |
git cat-file --batch-check='%(objecttype) %(objectname)' |
awk '/^tree/ {print $2}'|
while read tree; do
dups=$(git ls-tree $tree | cut -f 2- | tr A-Z a-z | sort | uniq -d)
test -z "$dups" || echo "$tree has duplicates: $dups"
done
That gives reasonable algorithmic complexity, but of course the shell
implementation is horrific. One could imagine that this could be
implemented as part of fsck_tree(), though, which is already reading
through all the entries (unfortunately it requires auxiliary storage
linear with the size of a given tree object, but that's not too bad).
But it would probably need:
1. To be enabled as an optional fsck warning, possibly even defaulting
to "ignore".
2. That "tr" could be any arbitrary transformation. Case-folding is
the obvious one, but in theory you could match the normalization
behavior of certain popular filesystems.
I'm not entirely convinced it's worth all of this effort, but I think it
would be _possible_ at least.
-Peff