Fw: [git-users] git fsck error - duplicate file entries - different then existing stackoverflow scenarios

2015-11-12 Thread Konstantin Khomoutov
A user recently asked an interesting question on the git-users list.
I think it warrants attentions of a specialists more hard-core than
we're there over at git-users.

So I'd like to solicit help if those knowledgeable, if possible.

Begin forwarded message:

Date: Wed, 11 Nov 2015 14:30:40 -0800 (PST)
From: Wind Over Water 
To: Git for human beings 
Subject: [git-users] git fsck error - duplicate file entries -
different then existing stackoverflow scenarios


Hi all,

I have a repo that is giving a 'git fsck --full' error that seems to be 
different from the existing questions and answers on stackoverflow on
this topic.  For example, in our fsck error it is not obvious which
file is actually duplicated and how/where.  And there is no commit sha
involved - apparently only blob and tree sha's.  But then finding good
documentation on this is challenging.

Might anyone have a pointer as to what to read to help figure out a 
solution/fix to the below?  Or know of a solution outright?

Thanks much in advance!

-sandy

$ git fsck --full

Checking object directories: 100% (256/256), done.

error in tree df79068051fa8702eae7e91635cca7eee1339002: contains
duplicate file entries

error in tree c2d09540a3c3f44c42be1dc8a2b0afa73a35f861: contains
duplicate file entries

Checking objects: 100% (623704/623704), done.

Checking connectivity: 623532, done.

dangling commit 4d1402c8c74c9f4de6172d7dbd5a14c41683c9e8


$ git ls-tree df79068051fa8702eae7e91635cca7eee1339002

100644 blob 14d6d1a6a2f4a7db4e410583c2893d24cb587766 build.gradle

12 blob cd70e37500a35663957cf60f011f81703be5d032 msrc

04 tree 658c892e15fbe0d3ea6b8490d9d54c5f2e658fc9 msrc

100644 blob f623819c94a08252298220871ac0ba1118372e59 pom.xml

100644 blob 9223cc2fddb138f691312c1ea2656b9dc17612d2 settings.gradle

04 tree c3bac1d92722bdee9588a27747b164baa275201f src


$ git ls-tree c2d09540a3c3f44c42be1dc8a2b0afa73a35f861

100644 blob 14d6d1a6a2f4a7db4e410583c2893d24cb587766 build.gradle

12 blob cd70e37500a35663957cf60f011f81703be5d032 msrc

04 tree 658c892e15fbe0d3ea6b8490d9d54c5f2e658fc9 msrc

100644 blob f623819c94a08252298220871ac0ba1118372e59 pom.xml

100644 blob 9223cc2fddb138f691312c1ea2656b9dc17612d2 settings.gradle

04 tree a5aa6758a25fee779cbb8c9717d744297071ea79 src


$ git show cd70e37500a35663957cf60f011f81703be5d032

src/main/java/com/foo/bar/baz/common/


$ git show 658c892e15fbe0d3ea6b8490d9d54c5f2e658fc9

tree 658c892e15fbe0d3ea6b8490d9d54c5f2e658fc9


BillingAggregator.java

BillingDataThriftAdapter.java

[...]

MetricsProcessor.java
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [git-users] git fsck error - duplicate file entries - different then existing stackoverflow scenarios

2015-11-12 Thread Jeff King
On Thu, Nov 12, 2015 at 02:02:10PM +0300, Konstantin Khomoutov wrote:

> A user recently asked an interesting question on the git-users list.
> I think it warrants attentions of a specialists more hard-core than
> we're there over at git-users.
> 
> So I'd like to solicit help if those knowledgeable, if possible.

Thanks. Curating user questions and forwarding the hard ones here is
appreciated.

> I have a repo that is giving a 'git fsck --full' error that seems to be 
> different from the existing questions and answers on stackoverflow on
> this topic.  For example, in our fsck error it is not obvious which
> file is actually duplicated and how/where.  And there is no commit sha
> involved - apparently only blob and tree sha's.  But then finding good
> documentation on this is challenging.

Yes, fsck does not traverse the graph in order. So it sees a problem
with a particular tree, but cannot know where that tree is within the
whole project tree, or which commits reference it. In fact, an arbitrary
number of commits might reference it.

The most useful thing is sometimes to ask which commit introduced the
tree (which can _also_ have multiple answers, but usually just one). You
can do that by walking the history, like this:

  tree=df79068051fa8702eae7e91635cca7eee1339002
  git log --all --format=raw --raw -t --no-abbrev | less +/$tree

That will visit each commit. The options are:

  - we visit commits reachable from all branches and tags (--all)

  - we include the sha1 of the root tree (due to --format=raw)

  - adding --raw shows the raw diff, which includes the sha1 of each
file touched by the commit

  - using "-t" includes the raw diff for trees, rather than just blobs

  - using "--no-abbrev" gives full 40-hex sha1s

And then "less +/$tree" will open the pager and immediately jump to the
first instance of the sha1 in question.

But of course that doesn't tell you how to fix it. It might tell you how
the bogus object came about (and it is a bogus object; a bug-free git
implementation should _never_ produce a tree with duplicate entries.
AFAIK we have never had such a bug in Git itself, but I have
occasionally come across problematic entries that I suspect were created
with very old versions of JGit).

> error in tree df79068051fa8702eae7e91635cca7eee1339002: contains
> duplicate file entries
> [...]
> $ git ls-tree df79068051fa8702eae7e91635cca7eee1339002
> 
> 100644 blob 14d6d1a6a2f4a7db4e410583c2893d24cb587766 build.gradle
> 
> 12 blob cd70e37500a35663957cf60f011f81703be5d032 msrc
> 
> 04 tree 658c892e15fbe0d3ea6b8490d9d54c5f2e658fc9 msrc
> 
> 100644 blob f623819c94a08252298220871ac0ba1118372e59 pom.xml
> 
> 100644 blob 9223cc2fddb138f691312c1ea2656b9dc17612d2 settings.gradle
> 
> 04 tree c3bac1d92722bdee9588a27747b164baa275201f src

Looks like "msrc" is your duplicate entry (even though the sha1s of the
sub-entries are different, the tree cannot have two entries with the
same name). You can use the "log" trick above to find the full path to it.

The fact that one is a symlink (mode 12) and one is a tree means
that whatever git implementation created this presumably has a bug
related to symlinks.

The only way to fix it is to rewrite the history mentioning the tree
(because once the tree is fixed, it will get a new sha1, and then any
commit referencing it will get a new sha1, and commits built on that,
and so forth).

You can use "git filter-branch" to do so. There is a sample command
here:

  
http://stackoverflow.com/questions/32577974/duplicate-file-error-while-pushing-mirror-into-git-repository/

that just rewrites each tree via a round-trip to the index (so it's not
clear which of the duplicate entries it will discard). You could also
write a more clever index-filter snippet to use git-update-index to
insert the entry you want.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html