On Tue, Oct 16 2018, Jeff King wrote:

> On Mon, Oct 15, 2018 at 01:01:50PM +0000, Per Lundberg wrote:
>
>> Sorry if this question has been asked before; I skimmed through the list
>> archives and the FAQ but couldn't immediately find it - please point me
>> in the right direction if it has indeed been discussed before.
>
> It is a frequently asked question, but it doesn't seem to be in any FAQ
> that I could find. The behavior you're seeing is intended. See this
> message (and the rest of the thread) for discussion:
>
>   https://public-inbox.org/git/7viq39avay....@alter.siamese.dyndns.org/
>
>> So my question is: is this by design or should this be considered a bug
>> in git? Of course, it depends largely on what .gitignore is being used
>> for - if we are talking about files which can easily be regenerated
>> (build artifacts, node_modules folders etc.) I can totally understand
>> the current behavior, but when dealing with more sensitive & important
>> content it's a bit inconvenient.
>
> Basically: yes. It would be nice to have that "do not track this, but do
> not trash it either" state for a file, but Git does not currently
> support that.

There's some patches in that thread that could be picked up by someone
interested. I think the approach mentioned by Matthieu Moy here makes
the most sense:
https://public-inbox.org/git/vpqd3t9656k....@bauges.imag.fr/

I don't think the rationale mentioned by Junio in
https://public-inbox.org/git/7v4oepaup7....@alter.siamese.dyndns.org/ is
very convincing.

The question is not whether .gitignore is intended to be used in some
specific way, e.g. only ignoring *.o files, but whether we can
reasonably suspect that users use the combination of the features we
expose in such a way that their precious data gets destroyed. User data
should get the benefit of the doubt.

Off the top of my head, I can imagine many ways in which this'll go
wrong:

 1. Even if you're using .gitignore only for "trashable" as as Junio
    mentions, git not trashing your data depends on everyone who
    modifies .gitignore in your project having enough situational
    awareness not to inadvertently add a glob to the file which
    *accidentally* ignores existing files, and *nothing warns about
    this*.

    Between the caveat noted in "It is not possible to re-include[...]"
    in gitignore(5) and negative pathspecs it can be really easy to get
    this wrong.

    So e.g. in git.git I can add a line with "*" to .gitignore, and
    nothing will complain or look unusual as long as I'm not introducing
    new files, and I'll only find out when some-new-file.c of mine gets
    trashed.

 2. Related, the UI "git add <ignored>" presents is just "Use -f if you
    really want to add them". Users who aren't careful will just think
    "oh, I just need -f in this case" and not alter .gitignore, leaving
    a timebomb for future users.

    Those new users will have no way of knowing that they've cloned a
    repo with a broken overzealous .gitignore, e.g. there's nothing on
    clone that says "you've just cloned a repo with N files, all of
    which are ignored, so git clean etc. will likely wipe out anything
    you have in the checkout".

 3. Since we implictly expose this "you need a one-off action to
    override .gitignore" noted in #2 users can and *do* use this for
    "soft" ignores.

    E.g. in a big work repo there's an ignore for *.png, even though the
    repo has thousands of such files, because it's not considered good
    practice to add them anymore (there's another static repo), and
    someone thought to use .gitignore to enforce that suggestion.

    I have a personal repo where I only want *.gpg files, and due to the
    inability to re-include files recursively (noted in #1) I just
    ignore '*' and use git veeery carefully. I was only worried about
    'git clean' so far, but now I see I need to worry about "checkout"
    as well.

But maybe the use-cases I'm mentioning are highly unusual and the repos
at work have ended up in some bizarre state and nobody else cares about
this.

It would be interesting if someone at a big git hosting providers (hint:
Jeff :) could provide some numbers about how common it is to have a
repository containing tracked files ignored by a .gitignore the
repository itself carries. This wouldn't cover all of #1-3 above, but is
probably a pretty good proxy metric.

I thought this could be done by:

    git ls-tree -r --name-only HEAD  | git check-ignore --no-index --stdin

But I see that e.g. on git.git this goes wrong due to
t/helper/.gitignore. So I don't know how one would answer "does this
repo have .gitignored files tracked?" in a one-liner.

Reply via email to