Re: Troubleshoot clone issue to NFS.

2015-05-24 Thread Duy Nguyen
On Fri, May 22, 2015 at 2:12 PM, Jeff King p...@peff.net wrote:
 So I think there are two possibilities for improving this:

   1. Find places where we expect the object will not exist (like the
  collision_test check you pointed out) and use a
  has_sha1_file_fast that accepts that it may very occasionally
  erroneously return false. In this case it would mean potentially
  skipping a collision check, but I think that is OK. That could have
  security implications, but only if an attacker:

a. has broken sha1 to generate a colliding object

b. can manipulate the victim into repacking in a loop

c. can manipulate the victim into fetching (or receiving a push)
   simultaneously with (b)

  at which point they can try to race the repack procedure to add
  their colliding object to the repository. It seems rather unlikely
  (especially part a).

In case you want to back away from option 2 because it starts to leak
raciness, which your old commit tried to fix in the first place. I
think the only other place that tests for lots of non-existent loose
objects is write_sha1_file (e.g. tar -xf bigtarball.tar.gz; cd
bigtarball; git init; git add .). But the number of calls should be
much smaller compared to index-pack and it does not use has_sha1_file,
it uses check_and_freshen_file() instead.

There are other places where has_sha1_file() may return 0, but I think
the number of calls is even smaller to bother (shallow.c,
fetch-pack.c, apply.c, buik-checkin.c)
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] git-p4: fix handling of multi-word P4EDITOR

2015-05-24 Thread Luke Diamand

On 07/05/15 23:16, Junio C Hamano wrote:

Luke Diamand l...@diamand.org writes:



[Resurrecting old thread]



Looking at run-command.c, GIT_WINDOES_NATIVE and POSIX seems to use
pretty much the same construct, except that they use SHELL_PATH
instead of sh.


I think the state of git on Windows is a bit shaky (I'm happy to be 
proved wrong of course), but I think the only seriously active port is 
the msys one.


That, as far as I can tell, uses an msys version of 'sh', so it will be 
perfectly happy with the sh -c ... construct.


There may be a native windows port in existence, but I can't find how to 
build this, and I assume it's going to need Visual Studio, which makes 
it a lot more complex to get going.


The code you were looking at in run-command.c says this:

#ifndef GIT_WINDOWS_NATIVE
nargv[nargc++] = SHELL_PATH;   !GIT_WINDOWS_NATIVE
#else
nargv[nargc++] = sh; GIT_WINDOWS_NATIVE
#endif
nargv[nargc++] = -c;

To me, that seems to imply that for GIT_WINDOWS_NATIVE, we take the 
*second* branch and use sh, so again, the the code as it stands will 
be fine. msysgit uses that path.


(The next line, trying to use -c has no chance of working if Cmd is 
being used).





So something like this may be sufficient, perhaps?

  Makefile  | 1 +
  git-p4.py | 2 +-
  2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 20058f1..fda44bf 100644
--- a/Makefile
+++ b/Makefile
@@ -1776,6 +1776,7 @@ $(SCRIPT_PYTHON_GEN): GIT-CFLAGS GIT-PREFIX 
GIT-PYTHON-VARS
  $(SCRIPT_PYTHON_GEN): % : %.py
$(QUIET_GEN)$(RM) $@ $@+  \
sed -e '1s|#!.*python|#!$(PYTHON_PATH_SQ)|' \
+   -e 's|SHELL_PATH|$(SHELL_PATH_SQ)|g' \
$ $@+  \
chmod +x $@+  \
mv $@+ $@
diff --git a/git-p4.py b/git-p4.py
index de06046..eb6d4b1 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1220,7 +1220,7 @@ class P4Submit(Command, P4UserMap):
  editor = os.environ.get(P4EDITOR)
  else:
  editor = read_pipe(git var GIT_EDITOR).strip()
-system([sh, -c, ('%s $@' % editor), editor, template_file])
+system(['''SHELL_PATH''', -c, ('%s $@' % editor), editor, 
template_file])


This seems to be expanded to '''sh''' which doesn't then work at all. I 
didn't take the time to investigate further though.




  # If the file was not saved, prompt to see if this patch should
  # be skipped.  But skip this verification step if configured so.


I don't think we need to do anything. msysgit works fine with the origin 
sh, -c, ... code.


Thanks!
Luke
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: recovering from unordered stage entries in index error

2015-05-24 Thread Duy Nguyen
On Sat, May 23, 2015 at 9:47 AM, McHenry, Matt
mmche...@carnegielearning.com wrote:
 So maybe you can do GIT_TRACE=2 git svn fetch and post the output.
 I'd expect to see something like git read-tree sha1 before fatal:
 unorder You can then use git ls-tree sha1 to examine this tree,
 try to sort the file list with LANG=C sort and compare with the
 original list.

 There is no read-tree in the output (below).  The sha1 that is 
 mentioned, 74332b7, is the one for the current trunk:

Hm.. neither read-tree nor update-index is in the output. I can see
git-svn closing stderr sometimes, but not sure if that's why we don't
see these commands, or something else. Could you try again with
GIT_TRACE=/absolute/path/to/some/where instead of GIT_TRACE=2 and post
the content of /abso../some/where?
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 3/3] git-p4: tests: use test-chmtime in place of touch

2015-05-24 Thread Junio C Hamano
Luke Diamand l...@diamand.org writes:

 diff --git a/t/t9813-git-p4-preserve-users.sh 
 b/t/t9813-git-p4-preserve-users.sh
 index 166b840..fe65788 100755
 --- a/t/t9813-git-p4-preserve-users.sh
 +++ b/t/t9813-git-p4-preserve-users.sh
 @@ -53,7 +53,8 @@ test_expect_success 'preserve users' '
   git commit --author Alice al...@example.com -m a change by 
 alice file1 
   git commit --author Bob b...@example.com -m a change by 
 bob file2 
   git config git-p4.skipSubmitEditCheck true 
 - P4EDITOR=touch P4USER=alice P4PASSWD=secret git p4 commit 
 --preserve-user 
 + P4EDITOR=test-chmtime +5 P4USER=alice P4PASSWD=secret 
 + git p4 commit --preserve-user 

I think this hunk is wrong; we need to either change  to \ to make
it a single logical line that exports three environment variables
only to git p4 commit --preserve-user, or insert export P4EDITOR
P4USER P4PASSWD  between these two lines.

The latter seems to be what the remainder of the test is doing, so
I'd fix this up to mimick them.

Sorry for not catching it in the earlier review.

Thanks.

 @@ -69,7 +70,7 @@ test_expect_success 'refuse to preserve users without 
 perms' '
   git config git-p4.skipSubmitEditCheck true 
   echo username-noperms: a change by alice file1 
   git commit --author Alice al...@example.com -m perms: a 
 change by alice file1 
 - P4EDITOR=touch P4USER=bob P4PASSWD=secret 
 + P4EDITOR=test-chmtime +5 P4USER=bob P4PASSWD=secret 
   export P4EDITOR P4USER P4PASSWD 
   test_must_fail git p4 commit --preserve-user 
   ! git diff --exit-code HEAD..p4/master
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] clone: add `--seed` shorthand

2015-05-24 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 Having slept on it, I really think --seed should be fetch from the
 seed into temp refs, and not what I posted earlier.

Yeah, I think that is the right way to do it.

And it happens to mesh well with the (not so well advertised but not
so well hidden) plan to allow loaded server side to advise --seed
from bundle hosted elsewhere, e.g.

 * clone connects to upload-pack

 * upload-pack advertises --seed=http://cdn.github.com/project.bundle
   via capability

 * clone disconnects from upload-pack, and runs (resumable)
   wget to the seed to grab bundle.

 * clone then fetches refs/*:refs/remotes/origin/* from the bundle

 * clone then continues to fetch into +refs/remotes/origin/* as
   usual, but does an equivalent of using --prune for this fetch
   to drop anything extra/stale that the seed bundle may have
   had.

 * optionally clone can immediately repack.

... which I wanted to see happen in a near future.

And that --seed thing that can name a local bundle file is a very
good first step toward the direction, I think.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 0/3] --seed as an alias for --dissociate --reference

2015-05-24 Thread Michael Haggerty
Thanks for working on this. I have one little bikeshedding comment...

On 05/21/2015 06:14 AM, Jeff King wrote:
 [...]
 There are a few open issues with this series:
 
   1. Assuming that seed is a reasonable verb for this concept, is
  --seed=repo OK for the option?  Would --seed-from=repo be
  better? (Also, the response bleh, seed is a terrible name is
  fine, too, but only if accompanied by your own suggestion :) ).

I think seed is a pretty good name. The only downside is that seed
suggests that the process injects just a few seeds that are much smaller
than the whole. But in fact, (hopefully) this option causes the bulk of
the needed objects to be pre-fetched.

I can't think of any name that is clearly better. Some brainstorming:

* prepare -- meh
* prime (as in prime the pump). But prime alone could have too many
meanings, so...
* prime-from
* pre-fetch or pre-fetch-from
* pre-load or pre-load-from

BTW I think --seed-from=repo is more self-explanatory than
--seed=repo.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'Minimal' diff-algorithm producing a different result than 'myers', 'patience' and 'histogram' ones

2015-05-24 Thread Jeff King
On Tue, May 12, 2015 at 03:07:46PM +0200, Dmitry Malikov wrote:

 I'm trying to compare 4 different git-diff algorithms and the
 'minimal' one is the most vague and non-obvious. The documentation
 says Spend extra time to make sure the smallest possible diff is
 produced. - that's all.
 
 By any chance, is there any example of diff when 'minimal' algorithm
 produces a different result than a 'myers', 'patience' and 'histogram'
 ones?

I don't know of a simple example offhand, but you can easily generate
all of the patches for a repository with each type by doing:

  for i in myers minimal patience histogram; do
git log --diff-algorithm=$i -p $i
  done

In git.git, the output for each type is distinct. You might also find
this thread interesting:

  http://thread.gmane.org/gmane.comp.version-control.git/143426

It mentions 717b83117 from git.git, whose minimal diff is over 300 lines
shorter than the non-minimal one. It's not exactly a simple example, but
I suspect you won't find a short case; minimal only matters in the
complicated ones.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir

2015-05-24 Thread Jeff King
On Sat, May 23, 2015 at 08:19:03AM +0700, Duy Nguyen wrote:

 On Sat, May 23, 2015 at 6:51 AM, Jeff King p...@peff.net wrote:
  The other problem is that I'm not sure stat data is enough to notice
  when a directory changes. Certainly the mtime should change, but if you
  have only one-second resolution on your mtimes, we can be fooled.
 
 mtime may or may not change. I based my untracked-cache series
 entirely on this directory mtime and did some research about it. For
 UNIXy filesystems on UNIXy OSes, mtime should work as you expect. FAT
 on Windows does not (but FAT on Linux does..). NTFS works fine
 according to some MS document. No idea about CIFS. But people often
 just do open operation of a time and this racy is not an issue. It is
 only an issue on the busy server side, and you can make sure you run
 on the right fs+OS.

Even on Linux+ext4, I think there is some raciness.

For instance, the program at the end of this mail will loop forever,
running stat on an open directory fd, then enumerating the entries in
the directory.  If we ever get the same stat data as the last iteration
but different contents, then it complains. If you run it alongside
something simple, like touching 10,000 files in the directory, it fails
pretty quickly.

This is because we have no atomicity guarantees with directories. We can
stat() them, and then while we are reading them, somebody else can be
changing the entries. Whether we see the before or after state
depends on the timing.

I'm not 100% sure this translates into real-world problems for
packfiles. If you notice that an object is missing and re-scan the pack
directory (stat-ing it during the re-scan), then the change that made
the object go missing must have happened _before_ the stat, and would
presumably be reflected in it (modulo mtime resolution issues). But I
haven't thought that hard about it yet, and I have a nagging feeling
that there will be problem cases.

It might be that you could get an atomic read of the directory by
doing a stat before and after and making sure they're the same (and if
not, repeating the readdir() calls). But I think that suffers from the
same mtime-resolution problems.

Linux+ext4 _does_ have nanosecond mtimes, which perhaps is enough to
assume that any change will be reflected.

I dunno. I guess the most interesting test would be something closer to
the real world: one process repeatedly making sure the object pointed to
by master exists, and another one constantly rewriting master and
then repacking the object.

-- 8 --
#include cache.h
#include string-list.h

static void get_data(const char *path, struct string_list *entries,
 struct stat_validity *validity)
{
DIR *dir = opendir(path);
struct dirent *de;

stat_validity_update(validity, dirfd(dir));
while ((de = readdir(dir)))
string_list_insert(entries, de-d_name);
closedir(dir);
}

static int list_eq(const struct string_list *a,
const struct string_list *b)
{
int i;

if (a-nr != b-nr)
return 0;
for (i = 0; i  a-nr; i++)
if (strcmp(a-items[i].string, b-items[i].string))
return 0;
return 1;
}

static void monitor(const char *path)
{
struct string_list last_entries = STRING_LIST_INIT_DUP;
struct stat_validity last_validity;

get_data(path, last_entries, last_validity);
while (1) {
struct string_list cur_entries = STRING_LIST_INIT_DUP;
struct stat_validity cur_validity;

get_data(path, cur_entries, cur_validity);
if (!memcmp(last_validity, cur_validity, 
sizeof(last_validity)) 
!list_eq(cur_entries, last_entries))
die(mismatch);

string_list_clear(last_entries, 0);
memcpy(last_entries, cur_entries, sizeof(last_entries));
stat_validity_clear(last_validity);
memcpy(last_validity, cur_validity, sizeof(last_validity));
}
}

int main(int argc, const char **argv)
{
monitor(*++argv);
return 0;
}
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] stat_validity: handle non-regular files

2015-05-24 Thread Jeff King
On Sat, May 23, 2015 at 01:00:06PM +0200, Michael Haggerty wrote:

 I don't have any insight about whether mtimes are reliable change
 indicators for directories.
 
 But if you make this change, you are changing the contract of the
 stat_validity functions:
 
 * Have you verified that no callers rely on the old stat_validity's
 check that the file is a regular file?

I think they are OK if a directory appears (they notice the error in
reading and call stat_validity_clear to throw away the result). But it
would be a problem, I suppose, if you put a FIFO or something into
$GIT_DIR/packed-refs. OTOH, that would suffer from the stat data
changing constantly, so we would fall back to safe behavior.

I don't know how careful we want to be. I guess the conservative choice
would be to barf if it's not a file or directory, both of which can be
handled pretty sanely.

 * The docstrings in cache.h need to be updated.

Thanks, will fix if I re-roll (I'm still not convinced this is a robust
thing to be doing overall).

   struct stat_validity {
  -   struct stat_data *sd;
  +   struct stat_data sd;
  +   unsigned mode;
   };
 
 Could the mode be stored directly in stat_data?

I'd rather not. stat_data is shared with the cache_entry code, which
holds the mode separately. I'd like to avoid impacting the index code at
all.

 By comparing modes, you are not only comparing file type (S_ISREG vs
 S_ISDIR etc.) but also all of the file permissions. That seems OK with
 me but you might want to document that fact. Plus, I don't know whether
 POSIX allows additional, implementation-defined information to be stored
 in st_mode. If so, you would be comparing that, too.

Yeah, I considered that. My feeling is that testing _more_ information
is probably OK. Even if there is extra data there, it presumably doesn't
change from call to call of stat() unless the directory changes. And if
we are wrong, we fail safely (e.g., if the permissions change we don't
technically need to re-read the pack directory, but it is OK to do so).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html