[PATCH 1/1] mingw: initialize HOME on startup

2019-07-04 Thread Karsten Blees via GitGitGadget
From: Karsten Blees 

HOME initialization was historically duplicated in many different places,
including /etc/profile, launch scripts such as git-bash.vbs and gitk.cmd,
and (although slightly broken) in the git-wrapper.

Even unrelated projects such as GitExtensions and TortoiseGit need to
implement the same logic to be able to call git directly.

Initialize HOME in git's own startup code so that we can eventually retire
all the duplicate initialization code.

Signed-off-by: Karsten Blees 
Signed-off-by: Johannes Schindelin 
---
 compat/mingw.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/compat/mingw.c b/compat/mingw.c
index 9b6d2400e1..1047427cb5 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -2299,6 +2299,30 @@ static void setup_windows_environment(void)
/* simulate TERM to enable auto-color (see color.c) */
if (!getenv("TERM"))
setenv("TERM", "cygwin", 1);
+
+   /* calculate HOME if not set */
+   if (!getenv("HOME")) {
+   /*
+* try $HOMEDRIVE$HOMEPATH - the home share may be a network
+* location, thus also check if the path exists (i.e. is not
+* disconnected)
+*/
+   if ((tmp = getenv("HOMEDRIVE"))) {
+   struct strbuf buf = STRBUF_INIT;
+   strbuf_addstr(&buf, tmp);
+   if ((tmp = getenv("HOMEPATH"))) {
+   strbuf_addstr(&buf, tmp);
+   if (is_directory(buf.buf))
+   setenv("HOME", buf.buf, 1);
+   else
+   tmp = NULL; /* use $USERPROFILE */
+   }
+   strbuf_release(&buf);
+   }
+   /* use $USERPROFILE if the home share is not available */
+   if (!tmp && (tmp = getenv("USERPROFILE")))
+   setenv("HOME", tmp, 1);
+   }
 }
 
 /*
-- 
gitgitgadget


[PATCH v2 1/1] gettext: always use UTF-8 on native Windows

2019-07-03 Thread Karsten Blees via GitGitGadget
From: Karsten Blees 

On native Windows, Git exclusively uses UTF-8 for console output (both
with MinTTY and native Win32 Console). Gettext uses `setlocale()` to
determine the output encoding for translated text, however, MSVCRT's
`setlocale()` does not support UTF-8. As a result, translated text is
encoded in system encoding (as per `GetAPC()`), and non-ASCII chars are
mangled in console output.

Side note: There is actually a code page for UTF-8: 65001. In practice,
it does not work as expected at least on Windows 7, though, so we cannot
use it in Git. Besides, if we overrode the code page, any process
spawned from Git would inherit that code page (as opposed to the code
page configured for the current user), which would quite possibly break
e.g. diff or merge helpers. So we really cannot override the code page.

In `init_gettext_charset()`, Git calls gettext's
`bind_textdomain_codeset()` with the character set obtained via
`locale_charset()`; Let's override that latter function to force the
encoding to UTF-8 on native Windows.

In Git for Windows' SDK, there is a `libcharset.h` and therefore we
define `HAVE_LIBCHARSET_H` in the MINGW-specific section in
`config.mak.uname`, therefore we need to add the override before that
conditionally-compiled code block.

Rather than simply defining `locale_charset()` to return the string
`"UTF-8"`, though, we are careful not to break `LC_ALL=C`: the
`ab/no-kwset` patch series, for example, needs to have a way to prevent
Git from expecting UTF-8-encoded input.

Signed-off-by: Karsten Blees 
Signed-off-by: Johannes Schindelin 
---
 gettext.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/gettext.c b/gettext.c
index d4021d690c..3f2aca5c3b 100644
--- a/gettext.c
+++ b/gettext.c
@@ -12,7 +12,25 @@
 #ifndef NO_GETTEXT
 #  include 
 #  include 
-#  ifdef HAVE_LIBCHARSET_H
+#  ifdef GIT_WINDOWS_NATIVE
+
+static const char *locale_charset(void)
+{
+   const char *env = getenv("LC_ALL"), *dot;
+
+   if (!env || !*env)
+   env = getenv("LC_CTYPE");
+   if (!env || !*env)
+   env = getenv("LANG");
+
+   if (!env)
+   return "UTF-8";
+
+   dot = strchr(env, '.');
+   return !dot ? env : dot + 1;
+}
+
+#  elif defined HAVE_LIBCHARSET_H
 #  include 
 #  else
 #  include 
-- 
gitgitgadget


[PATCH 1/1] gettext: always use UTF-8 on native Windows

2019-06-27 Thread Karsten Blees via GitGitGadget
From: Karsten Blees 

Git on native Windows exclusively uses UTF-8 for console output (both with
MinTTY and native Console windows). Gettext uses setlocale() to determine
the output encoding for translated text, however, MSVCRT's setlocale()
doesn't support UTF-8. As a result, translated text is encoded in system
encoding (GetAPC()), and non-ASCII chars are mangled in console output.

Use gettext's bind_textdomain_codeset() to force the encoding to UTF-8 on
native Windows.

In this developers' setup, HAVE_LIBCHARSET_H is apparently defined, but
we *really* want to override the locale_charset() here.

Signed-off-by: Karsten Blees 
Signed-off-by: Johannes Schindelin 
---
 gettext.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gettext.c b/gettext.c
index d4021d690c..d8423e5c41 100644
--- a/gettext.c
+++ b/gettext.c
@@ -12,7 +12,9 @@
 #ifndef NO_GETTEXT
 #  include 
 #  include 
-#  ifdef HAVE_LIBCHARSET_H
+#  ifdef GIT_WINDOWS_NATIVE
+#  define locale_charset() "UTF-8"
+#  elif defined HAVE_LIBCHARSET_H
 #  include 
 #  else
 #  include 
-- 
gitgitgadget


[PATCH 2/3] mingw: replace MSVCRT's fstat() with a Win32-based implementation

2018-10-23 Thread Karsten Blees via GitGitGadget
From: Karsten Blees 

fstat() is the only stat-related CRT function for which we don't have a
full replacement yet (and thus the only reason to stick with MSVCRT's
'struct stat' definition).

Fully implement fstat(), in preparation of implementing a POSIX 2013
compatible 'struct stat' with nanosecond-precision file times.

This allows us also to implement some clever code to handle pipes and
character devices in our own way.

Signed-off-by: Karsten Blees 
Signed-off-by: Johannes Schindelin 
---
 compat/mingw.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index d2e7d86db..07fc0b79a 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -771,20 +771,31 @@ int mingw_stat(const char *file_name, struct stat *buf)
 int mingw_fstat(int fd, struct stat *buf)
 {
HANDLE fh = (HANDLE)_get_osfhandle(fd);
+   DWORD avail, type = GetFileType(fh) & ~FILE_TYPE_REMOTE;
 
-   if (fh == INVALID_HANDLE_VALUE) {
-   errno = EBADF;
-   return -1;
-   }
-   /* direct non-file handles to MS's fstat() */
-   if (GetFileType(fh) != FILE_TYPE_DISK)
-   return _fstati64(fd, buf);
+   switch (type) {
+   case FILE_TYPE_DISK:
+   return get_file_info_by_handle(fh, buf);
 
-   if (!get_file_info_by_handle(fh, buf))
+   case FILE_TYPE_CHAR:
+   case FILE_TYPE_PIPE:
+   /* initialize stat fields */
+   memset(buf, 0, sizeof(*buf));
+   buf->st_nlink = 1;
+
+   if (type == FILE_TYPE_CHAR) {
+   buf->st_mode = _S_IFCHR;
+   } else {
+   buf->st_mode = _S_IFIFO;
+   if (PeekNamedPipe(fh, NULL, 0, NULL, &avail, NULL))
+   buf->st_size = avail;
+   }
return 0;
 
-   errno = EBADF;
-   return -1;
+   default:
+   errno = EBADF;
+   return -1;
+   }
 }
 
 static inline void time_t_to_filetime(time_t t, FILETIME *ft)
-- 
gitgitgadget



[PATCH 3/3] mingw: implement nanosecond-precision file times

2018-10-23 Thread Karsten Blees via GitGitGadget
From: Karsten Blees 

We no longer use any of MSVCRT's stat-functions, so there's no need to
stick to a CRT-compatible 'struct stat' either.

Define and use our own POSIX-2013-compatible 'struct stat' with nanosecond-
precision file times.

Note: This can cause performance issues when using Git variants with
different file time resolutions, as the timestamps are stored in the Git
index: after updating the index with a Git variant that uses
second-precision file times, a nanosecond-aware Git will think that
pretty much every single file listed in the index is out of date.

Signed-off-by: Karsten Blees 
Signed-off-by: Johannes Schindelin 
---
 compat/mingw.c   | 18 ++
 compat/mingw.h   | 36 ++--
 config.mak.uname |  2 --
 3 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 07fc0b79a..26016d02e 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -592,9 +592,11 @@ static inline long long filetime_to_hnsec(const FILETIME 
*ft)
return winTime - 1164447360LL;
 }
 
-static inline time_t filetime_to_time_t(const FILETIME *ft)
+static inline void filetime_to_timespec(const FILETIME *ft, struct timespec 
*ts)
 {
-   return (time_t)(filetime_to_hnsec(ft) / 1000);
+   long long hnsec = filetime_to_hnsec(ft);
+   ts->tv_sec = (time_t)(hnsec / 1000);
+   ts->tv_nsec = (hnsec % 1000) * 100;
 }
 
 /**
@@ -653,9 +655,9 @@ static int do_lstat(int follow, const char *file_name, 
struct stat *buf)
buf->st_size = fdata.nFileSizeLow |
(((off_t)fdata.nFileSizeHigh)<<32);
buf->st_dev = buf->st_rdev = 0; /* not used by Git */
-   buf->st_atime = filetime_to_time_t(&(fdata.ftLastAccessTime));
-   buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
-   buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
+   filetime_to_timespec(&(fdata.ftLastAccessTime), 
&(buf->st_atim));
+   filetime_to_timespec(&(fdata.ftLastWriteTime), &(buf->st_mtim));
+   filetime_to_timespec(&(fdata.ftCreationTime), &(buf->st_ctim));
if (fdata.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) {
WIN32_FIND_DATAW findbuf;
HANDLE handle = FindFirstFileW(wfilename, &findbuf);
@@ -753,9 +755,9 @@ static int get_file_info_by_handle(HANDLE hnd, struct stat 
*buf)
buf->st_size = fdata.nFileSizeLow |
(((off_t)fdata.nFileSizeHigh)<<32);
buf->st_dev = buf->st_rdev = 0; /* not used by Git */
-   buf->st_atime = filetime_to_time_t(&(fdata.ftLastAccessTime));
-   buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
-   buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
+   filetime_to_timespec(&(fdata.ftLastAccessTime), &(buf->st_atim));
+   filetime_to_timespec(&(fdata.ftLastWriteTime), &(buf->st_mtim));
+   filetime_to_timespec(&(fdata.ftCreationTime), &(buf->st_ctim));
return 0;
 }
 
diff --git a/compat/mingw.h b/compat/mingw.h
index 571019d0b..9419b27e1 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -327,18 +327,41 @@ static inline int getrlimit(int resource, struct rlimit 
*rlp)
 }
 
 /*
- * Use mingw specific stat()/lstat()/fstat() implementations on Windows.
+ * Use mingw specific stat()/lstat()/fstat() implementations on Windows,
+ * including our own struct stat with 64 bit st_size and nanosecond-precision
+ * file times.
  */
 #ifndef __MINGW64_VERSION_MAJOR
 #define off_t off64_t
 #define lseek _lseeki64
+struct timespec {
+   time_t tv_sec;
+   long tv_nsec;
+};
 #endif
 
-/* use struct stat with 64 bit st_size */
+struct mingw_stat {
+_dev_t st_dev;
+_ino_t st_ino;
+_mode_t st_mode;
+short st_nlink;
+short st_uid;
+short st_gid;
+_dev_t st_rdev;
+off64_t st_size;
+struct timespec st_atim;
+struct timespec st_mtim;
+struct timespec st_ctim;
+};
+
+#define st_atime st_atim.tv_sec
+#define st_mtime st_mtim.tv_sec
+#define st_ctime st_ctim.tv_sec
+
 #ifdef stat
 #undef stat
 #endif
-#define stat _stati64
+#define stat mingw_stat
 int mingw_lstat(const char *file_name, struct stat *buf);
 int mingw_stat(const char *file_name, struct stat *buf);
 int mingw_fstat(int fd, struct stat *buf);
@@ -351,13 +374,6 @@ int mingw_fstat(int fd, struct stat *buf);
 #endif
 #define lstat mingw_lstat
 
-#ifndef _stati64
-# define _stati64(x,y) mingw_stat(x,y)
-#elif defined (_USE_32BIT_TIME_T)
-# define _stat32i64(x,y) mingw_stat(x,y)
-#else
-# define _stat64(x,y) mingw_stat(x,y)
-#endif
 
 int mingw_utime(const char *file_name, const struct utimbuf *times);
 #define utime mingw_utime
diff --git a/config.mak.una

Re: [PATCH v2] hashmap API: introduce for_each_hashmap_entry() helper macro

2016-03-19 Thread Karsten Blees
Am 17.03.2016 um 11:38 schrieb Alexander Kuleshov:

> This patch introduces the for_each_hashmap_entry() macro for more

I'd rather call it 'hashmap_for_each', following the pattern
'operandtype_operation' used throughout git. E.g. we already have
'hashmap_get', not 'get_hashmap_entry'.

I realize that existing *for_each* implementations in the git code
base are a bit of a mess (except 'sha1_array_for_each_unique'). E.g.
there is 'for_each_string_list' and 'for_each_string_list_item'. Both
loop over the string_list_items of a string_list, but one is named
after the collection type, the other after the item type...IMO this
shouldn't set an example for future code.

The rest of the patch looks good to me.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] hashmap API: introduce for_each_hashmap_entry() helper macro

2016-03-19 Thread Karsten Blees
Am 16.03.2016 um 17:39 schrieb Alexander Kuleshov:

> There is common pattern to traverse a hashmap in git source code:
> 
> hashmap_iter_init(map, &iter);
> while ((entry = hashmap_iter_next(&iter)))
>  // do something with entry
> 

The hashmap_iter_first() function allows you to do this instead:

for (entry = hashmap_iter_first(map, &iter); entry; entry = 
hashmap_iter_next(&iter))
doSomething(entry);

With an appropriate macro definition, this could be simplified to:

#define hashmap_for_each(map, iter, entry) for (entry = 
hashmap_iter_first(map, iter); entry; entry = hashmap_iter_next(iter))
...
hashmap_for_each(map, &iter, entry)
doSomething(entry);

You would still need to declare the 'iter' and 'entry' variables, but
there is no danger of decl-after-statement or variable shadowing
mentioned by Junio. That is, you can do this:

hashmap_for_each(map, &iter, entry)
if (checkCondition(entry))
break;
// work with found entry

Or even this:

hashmap_for_each(map, &iter1, entry1)
hashmap_for_each(map, &iter2, entry2)
doSomething(entry1, entry2);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: broken racy detection and performance issues with nanosecond file times

2015-09-29 Thread Karsten Blees
Am 28.09.2015 um 19:38 schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> Problem 1: Failure to detect racy files (without USE_NSEC)
>> ==
>>
>> Git may not detect racy changes when 'update-index' runs in parallel
>> to work tree updates.
>>
>> Consider this (where timestamps are t.):
>>
>>  t0.0$ echo "foo" > file1
>>  t0.1$ git update-index file1 &  # runs in background
> 
> I just wonder after looking at the ampersand here ...
> 
>> Please let me know what you think of this...maybe I've completely
>> screwed up and can no longer see the forest for all the trees.
> 
> ... if your task would become much simpler if you declare "once you
> give Git the control, do not muck with the repository until you get
> the control back".
> 

This is just to illustrate the problem. GUI-based applications will
often do things in the background that you cannot control. E.g. gitk,
git gui, Eclipse or TortoiseGit don't tell you when and how long you
shouldn't touch the working copy. At the same time, IntelliJ IDEA and
most office suits have the auto-save feature turned on by default,
and you cannot tell them when *not* to auto-save.

It may still be quite unlikely that this happens (you need two
changes within a second, without changing the file size), but *if*
it happens, the user may not even notice. And as git trusts the
false stat data blindly, the problem won't go away automatically.
You can mark all entries racy by setting index mtime to some value
far in the past, but this implies that you noticed that something
was wrong...

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] read-cache: fix file time comparisons with different precisions

2015-09-29 Thread Karsten Blees
Am 28.09.2015 um 14:52 schrieb Johannes Schindelin:
> Otherwise there would be that little loop-hole where (nsec % 1000) == 0 *by 
> chance* and we assume the timestamps to be identical even if they are not.

Yeah, but in this case the file would be racy, as racy-checks use
the same comparison now.

IMO change detection is so fundamental that it should Just Work,
without having a plethora of config options that we need to explain
to end users.

If that means that once in a million cases we need an extra content
check to revalidate such falsely racy entries, that's fine with me.

Cheers,
Karsten
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] read-cache: fix file time comparisons with different precisions

2015-09-28 Thread Karsten Blees
Different git variants record file times in the index with different
precisions, according to their capabilities. E.g. git compiled with NO_NSEC
records seconds only, JGit records the mtime in milliseconds, but leaves
ctime blank (because ctime is unavailable in Java).

This causes performance issues in git compiled with USE_NSEC, because index
entries with such 'incomplete' timestamps are considered dirty, triggering
unnecessary content checks.

Add a file time comparison function that auto-detects the precision based
on the number of trailing 0 digits, and compares with the lower precision
of both values. This initial version supports the known precisions seconds
(git + NO_NSEC), milliseconds (JGit) and nanoseconds (git + USE_NSEC), but
can be easily extended to e.g. microseconds.

Use the new comparison function in both dirty and racy checks. As a side
effect, this fixes racy detection in USE_NSEC-enabled git with
core.checkStat=minimal, as the coreStat setting now affects racy checks as
well.

Finally, do not check ctime if ctime.sec is 0 (as recorded by JGit).

Signed-off-by: Karsten Blees 
---
 read-cache.c | 62 
 1 file changed, 41 insertions(+), 21 deletions(-)


This patch fixes problems 3 and 4, by trying to auto-detect the recorded file
time precision.


diff --git a/read-cache.c b/read-cache.c
index 87204a5..3a4e6cd 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -99,23 +99,50 @@ void fill_stat_data(struct stat_data *sd, struct stat *st)
sd->sd_size = st->st_size;
 }
 
+/*
+ * Compares two file times. Returns 0 if equal, <0 if t1 < t2, >0 if t1 > t2.
+ * Auto-detects precision based on trailing 0 digits. Compares seconds only if
+ * core.checkStat=minimal.
+ */
+static inline int cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
+  uint32_t t2_sec, uint32_t t2_nsec) {
+#ifdef USE_NSEC
+   /*
+* Compare seconds and return result if different, or checkStat=mimimal,
+* or one of the time stamps has second precision only (nsec == 0).
+*/
+   int diff = t1_sec - t2_sec;
+   if (diff || !check_stat || !t1_nsec || !t2_nsec)
+   return diff;
+
+   /*
+* Check if one of the time stamps has millisecond precision only (i.e.
+* the trailing 6 digits are 0). First check the trailing 6 bits so that
+* we only do (slower) modulo division if necessary.
+*/
+   if ((!(t1_nsec & 0x3f) && !(t1_nsec % 100)) ||
+   (!(t2_nsec & 0x3f) && !(t2_nsec % 100)))
+   /* Compare milliseconds. */
+   return (t1_nsec - t2_nsec) / 100;
+
+   /* Compare nanoseconds */
+   return t1_nsec - t2_nsec;
+#else
+   return t1_sec - t2_sec;
+#endif
+}
+
 int match_stat_data(const struct stat_data *sd, struct stat *st)
 {
int changed = 0;
 
-   if (sd->sd_mtime.sec != (unsigned int)st->st_mtime)
-   changed |= MTIME_CHANGED;
-   if (trust_ctime && check_stat &&
-   sd->sd_ctime.sec != (unsigned int)st->st_ctime)
-   changed |= CTIME_CHANGED;
-
-#ifdef USE_NSEC
-   if (check_stat && sd->sd_mtime.nsec != ST_MTIME_NSEC(*st))
+   if (cmp_filetime(sd->sd_mtime.sec, sd->sd_mtime.nsec,
+(unsigned) st->st_mtime, ST_MTIME_NSEC(*st)))
changed |= MTIME_CHANGED;
-   if (trust_ctime && check_stat &&
-   sd->sd_ctime.nsec != ST_CTIME_NSEC(*st))
+   if (trust_ctime && check_stat && sd->sd_ctime.sec &&
+   cmp_filetime(sd->sd_ctime.sec, sd->sd_ctime.nsec,
+(unsigned) st->st_ctime, ST_CTIME_NSEC(*st)))
changed |= CTIME_CHANGED;
-#endif
 
if (check_stat) {
if (sd->sd_uid != (unsigned int) st->st_uid ||
@@ -276,16 +303,9 @@ static int ce_match_stat_basic(const struct cache_entry 
*ce, struct stat *st)
 static int is_racy_stat(const struct index_state *istate,
const struct stat_data *sd)
 {
-   return (istate->timestamp.sec &&
-#ifdef USE_NSEC
-/* nanosecond timestamped files can also be racy! */
-   (istate->timestamp.sec < sd->sd_mtime.sec ||
-(istate->timestamp.sec == sd->sd_mtime.sec &&
- istate->timestamp.nsec <= sd->sd_mtime.nsec))
-#else
-   istate->timestamp.sec <= sd->sd_mtime.sec
-#endif
-   );
+   return istate->timestamp.sec &&
+  (cmp_filetime(istate->timestamp.sec, istate->timestamp.nsec,
+sd->sd_mtime.sec, sd->sd_mtime.nsec) <= 0);
 }
 
 static int is_racy_timestamp(const struct index_state *istate,
-- 
2.1.0.msysgit.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


broken racy detection and performance issues with nanosecond file times

2015-09-25 Thread Karsten Blees
Hi there,

I think I found a few nasty problems with racy detection, as well as
performance issues when using git implementations with different file
time resolutions on the same repository (e.g. git compiled with and
without USE_NSEC, libgit2 compiled with and without USE_NSEC, JGit
executed in different Java implementations...).

Let me start by listing relevant file time gotchas (skip this if it
sounds too familiar) before diving into problem descriptions. Some
ideas for potential solutions are at the end.


Notable file time facts:


The st_ctime discrepancy:
* stat.st_ctime means "change time" (of file metadata) on POSIX
  systems and "creation time" on Windows
* While some file systems may track all four time stamps (mtime,
  atime, change time and creation time), there are no public OS APIs
  to obtain creation time on POSIX / change time on Windows.

Linux:
* In-core file times may not be properly rounded to on-disk
  precision, causing spurious file time changes when the cache is
  refreshed from disk. This was fixed for typical Unix file systems
  in kernel 2.6.11. The fix for CEPH, CIFS, NTFS, UFS and FUSE will
  be in kernel 4.3. There's no fix for FAT-based file systems yet.
* Maximum file time precision is 1 ns (or 1 s with really old glibc).

Windows:
* Maximum file time precision is 100 ns.

Java <= 6:
* Only exposes mtime in milliseconds (via File.getLastModifiedTime).

Java >= 7:
* Only exposes mtime, atime and creation time, no change time (see
  java.nio.file.attribute.BasicFileAttributes).
* Maximum file time precision is implementation specific (OpenJDK:
  1 microsecond on both Unix [1] and Windows [2]).
* On platforms or file systems that don't support creation time,
  BasicFileAttribtes.creationTime() is implementation specific
  (OpenJDK returns mtime instead). There's no public API to detect
  whether creation time is supported or "emulated" in some way.

Git Options:
* NO_NSEC (git only): compile-time option that disables recording of
  nanoseconds in the index, implies USE_NSEC=false.
* USE_NSEC (git and libgit2 with [3]): compile-time option that
  enables nanosecond comparison in both up-to-date and racy checks.
* core.checkStat=minimal (git, libgit2, JGit): config-option that
  disables nanosecond comparison in up-to-date checks, but not in
  racy checks.

JGit:
* Only uses mtime, rounded to milliseconds. While there is a
  DirCacheEntry.setCreationTime() [4] to set the index entry's ctime
  field, AFAICT its not used anywhere.
* Does not compare nanoseconds if the cached value recorded in the
  index is 0, to prevent performance issues with NO_NSEC git
  implementations [5].


Problem 1: Failure to detect racy files (without USE_NSEC)
==

Git may not detect racy changes when 'update-index' runs in parallel
to work tree updates.

Consider this (where timestamps are t.):

 t0.0$ echo "foo" > file1
 t0.1$ git update-index file1 &  # runs in background
 t0.2$ # update-index records stats and sha1 of file1 in new index
 t0.3$ echo "bar" > file1
 $ # update-index writes other index entries
 t1.0$ # update-index finishes (sets mtime of the new index to t1.0!)
 t1.1$ git status # doesn't detect that file1 has changed

The problem here is that racy checks in 'git status' compare against
the new index file's mtime (t1.0), which may be newer than the last
change of file1.


Problem 2: Failure to detect racy files (mixed USE_NSEC)


Git may fail to detect racy conditions if file times in .git/index
have been recorded by another git implementation with better file
time resolution.

Consider the following sequence:

 t0.0$ echo "foo" > file1
 t0.1$ use-nsec-git update-index file1
 t0.2$ echo "bar" > file1
 $ sleep 1
 t1.0$ touch file2
 t1.1$ use-nsec-git status # rewrites index, to store file2 change
 t1.2$ git status # doesn't detect that file1 has changed

The problem here is that the first, nsec-enabled 'git status' does
not consider file1 racy (with nanosecond precision, the file is dirty
already (t0.0 != t0.2), so no racy-checks are performed). Thus, it
will not squash the size field (as a second-precision-git would).
However, it will rewrite the index to capture the status change of
file2, and thus create a new index file with mtime = t1.1. Similar
to problem 1, subsequent 'git status' with second-precision has no
way to detect that file1 has changed.

This problem would not be limited to USE_NSEC-enabled/disabled git,
it occurs whenever different file time resolutions are at play, e.g.:
 * second-based git vs. millisecond-based JGit
 * millisecond-based JGit vs. nanosecond-enabled git
 * GIT_WORK_TREE on ext2 (1 s) and GIT_DIR on ext4 (1 ns)
 * JGit executed by different Java implementations (with different
   file time resolutions)


Problem 3: Failure to detect racy files with core.checkStat=minimal
===

Re: Git doesn't detect change, if file modification time is restored to original one

2015-07-23 Thread Karsten Blees
Am 23.07.2015 um 16:53 schrieb Konstantin Khomoutov:
> On Thu, 23 Jul 2015 11:14:11 +0200
> Konrád Lőrinczi  wrote:
> 
> [...]
>> I accept these solutions as workarounds, but the real solution would
>> be: Dev suggestions:
>> 1) Add a --force-reread option to git status, so user can force
>> reread tree. git status --force-reread
>>
>> 2) Add status.force-reread (true or false) option to .git/config so
>> user can set this variable permanently for a repo.
>> status.force-reread = false (should be default)
>>
>> Could be possible to implement 1) and 2) features to next git release?
> 
> Could you explain what's your real use case with preserving mtimes
> while changing the files?  I mean, implementing "mtime-stability"
> in your tools appears to be a good excersize in programming but what
> real-world problem does it solve?
> 

I'd like to add that this is not a git-specific problem: resetting mtime
on purpose will fool lots of programs, including backup software, file
synchronization tools (rsync, xcopy /D), build systems (make), and web
servers / proxies (If-Modified-Since requests).

So you would typically reset mtime if you *want* programs to ignore the
changes.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Documentation/i18n.txt: clarify character encoding support

2015-07-01 Thread Karsten Blees
As a "distributed" VCS, git should better define the encodings of its core
textual data structures, in particular those that are part of the network
protocol.

That git is encoding agnostic is only really true for blob objects. E.g.
the 'non-NUL bytes' requirement of tree and commit objects excludes
UTF-16/32, and the special meaning of '/' in the index file as well as
space and linefeed in commit objects eliminates EBCDIC and other non-ASCII
encodings.

Git expects bytes < 0x80 to be pure ASCII, thus CJK encodings that partly
overlap with the ASCII range are problematic as well. E.g. fmt_ident()
removes trailing 0x5C from user names on the assumption that it is ASCII
'\'. However, there are over 200 GBK double byte codes that end in 0x5C.

UTF-8 as default encoding on Linux and respective path translations in the
Mac and Windows versions have established UTF-8 NFC as de-facto standard
for path names.

Update the documentation in i18n.txt to reflect the current status-quo.

Signed-off-by: Karsten Blees 
---

Sorry for the delay, got swamped with other stuff...

Am 17.06.2015 um 22:45 schrieb Junio C Hamano:
> 
> ... I am OK to describe "pathnames are mangled into UTF-8 NFC on
> certain filesystems" as a warning.  I am OK if we encourage the use
> of UTF-8, especially if a project wants to be forward looking
> (i.e. it may currently be a monoculture but may become cross
> platform in the future).  I just do not want to see us saying "you
> *must* encode your path in UTF-8 NFC".
> 
...
> Yes, that is exatly what I said, isn't it?  "Use whatever works for
> your project, we do not dictate."


IMO we *have* to clearly specify an encoding. This freedom of choice
you're proclaiming just does not work in reality.

E.g. Git for Windows prior to 1.7.10 recorded file names in Windows
system encoding, which was perfectly legitimate according to the
documentation. Yet we had numerous bug reports regarding file name
encoding problems (you couldn't even share repos across different
Windows versions, let alone with Linux / Mac / JGit...).

You cannot simply tell users that this is because of Git's superior,
flexible design and its their own fault...except of course if you
want them to switch to VCSes that *do* properly define their file
formats and network protocols - such as subversion or bazaar.
(sorry for the sarcasm, couldn't resist)

I think its important to realize that specifying an encoding is
*not* a limitation - on the contrary: it *enables* us to do things
that would be impossible if file names were just "uninterpreted
sequences of non-NUL bytes". This includes features that are so
fundamental that we take them for granted, e.g. displaying file
names using *real* characters rather than just octal escapes.


I've rewritten the path name paragraph to better describe the
problems to expect with legacy encodings. I hope you like this
version better.

Of course, it would be nice to hear other opinions as well - this
probably shouldn't be a discussion between the two of us :-)

Karsten



 Documentation/i18n.txt | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/Documentation/i18n.txt b/Documentation/i18n.txt
index e9a1d5d..2dd79db 100644
--- a/Documentation/i18n.txt
+++ b/Documentation/i18n.txt
@@ -1,18 +1,31 @@
-At the core level, Git is character encoding agnostic.
-
- - The pathnames recorded in the index and in the tree objects
-   are treated as uninterpreted sequences of non-NUL bytes.
-   What readdir(2) returns are what are recorded and compared
-   with the data Git keeps track of, which in turn are expected
-   to be what lstat(2) and creat(2) accepts.  There is no such
-   thing as pathname encoding translation.
+Git is to some extent character encoding agnostic.
 
  - The contents of the blob objects are uninterpreted sequences
of bytes.  There is no encoding translation at the core
level.
 
- - The commit log messages are uninterpreted sequences of non-NUL
-   bytes.
+ - Path names are encoded in UTF-8 normalization form C. This
+   applies to tree objects, the index file, ref names, as well as
+   path names in command line arguments, environment variables
+   and config files (`.git/config` (see linkgit:git-config[1]),
+   linkgit:gitignore[5], linkgit:gitattributes[5] and
+   linkgit:gitmodules[5]).
++
+Note that Git at the core level treats path names simply as
+sequences of non-NUL bytes, there are no path name encoding
+conversions (except on Mac and Windows). Therefore, using
+non-ASCII path names will mostly work even on platforms and file
+systems that use legacy extended ASCII encodings. However,
+repositories created on such systems will not work properly on
+UTF-8-based systems (e.g. Linux, Mac, Windows) and vice versa.
+Additionally, many Git-based tools simply assume path names to

[PATCH v2] Makefile / racy-git.txt: clarify USE_NSEC prerequisites

2015-07-01 Thread Karsten Blees
Signed-off-by: Karsten Blees 
---

...just changed wording as you suggested.

 Documentation/technical/racy-git.txt | 8 ++--
 Makefile | 9 +
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/racy-git.txt 
b/Documentation/technical/racy-git.txt
index 242a044..4a8be4d 100644
--- a/Documentation/technical/racy-git.txt
+++ b/Documentation/technical/racy-git.txt
@@ -41,13 +41,17 @@ With a `USE_STDEV` compile-time option, `st_dev` is also
 compared, but this is not enabled by default because this member
 is not stable on network filesystems.  With `USE_NSEC`
 compile-time option, `st_mtim.tv_nsec` and `st_ctim.tv_nsec`
-members are also compared, but this is not enabled by default
+members are also compared. On Linux, this is not enabled by default
 because in-core timestamps can have finer granularity than
 on-disk timestamps, resulting in meaningless changes when an
 inode is evicted from the inode cache.  See commit 8ce13b0
 of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
 ([PATCH] Sync in core time granularity with filesystems,
-2005-01-04).
+2005-01-04). This patch is included in kernel 2.6.11 and newer, but
+only fixes the issue for file systems with exactly 1 ns or 1 s
+resolution. Other file systems are still broken in current Linux
+kernels (e.g. CEPH, CIFS, NTFS, UDF), see
+https://lkml.org/lkml/2015/6/9/714
 
 Racy Git
 
diff --git a/Makefile b/Makefile
index 54ec511..46d181a 100644
--- a/Makefile
+++ b/Makefile
@@ -217,10 +217,11 @@ all::
 # as the compiler can crash (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49299)
 #
 # Define USE_NSEC below if you want git to care about sub-second file mtimes
-# and ctimes. Note that you need recent glibc (at least 2.2.4) for this, and
-# it will BREAK YOUR LOCAL DIFFS! show-diff and anything using it will likely
-# randomly break unless your underlying filesystem supports those sub-second
-# times (my ext3 doesn't).
+# and ctimes. Note that you need recent glibc (at least 2.2.4) for this. On
+# Linux, kernel 2.6.11 or newer is required for reliable sub-second file times
+# on file systems with exactly 1 ns or 1 s resolution. If you intend to use Git
+# on other file systems (e.g. CEPH, CIFS, NTFS, UDF), don't enable USE_NSEC. 
See
+# Documentation/technical/racy-git.txt for details.
 #
 # Define USE_ST_TIMESPEC if your "struct stat" uses "st_ctimespec" instead of
 # "st_ctim"
-- 
2.4.3.windows.1.1.g87477f9

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] config.c: fix writing config files on Windows network shares

2015-06-30 Thread Karsten Blees
Renaming to an existing file doesn't work on Windows network shares if the
target file is open.

munmap() the old config file before commit_lock_file.

Signed-off-by: Karsten Blees 
---

See https://github.com/git-for-windows/git/issues/226

Strangely, renaming to an open file works fine on local disks...

 config.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/config.c b/config.c
index 07133ef..3a23c11 100644
--- a/config.c
+++ b/config.c
@@ -2153,6 +2153,9 @@ int git_config_set_multivar_in_file(const char 
*config_filename,
  contents_sz - copy_begin) <
contents_sz - copy_begin)
goto write_err_out;
+
+   munmap(contents, contents_sz);
+   contents = NULL;
}
 
if (commit_lock_file(lock) < 0) {
-- 
2.4.3.windows.1.1.g87477f9


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Documentation/i18n.txt: clarify character encoding support

2015-06-15 Thread Karsten Blees
Am 15.06.2015 um 02:12 schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> diff --git a/Documentation/i18n.txt b/Documentation/i18n.txt
>> index e9a1d5d..e5f6233 100644
>> --- a/Documentation/i18n.txt
>> +++ b/Documentation/i18n.txt
>> @@ -1,18 +1,28 @@
>> -At the core level, Git is character encoding agnostic.
>> -
>> - - The pathnames recorded in the index and in the tree objects
>> -   are treated as uninterpreted sequences of non-NUL bytes.
>> -   What readdir(2) returns are what are recorded and compared
>> -   with the data Git keeps track of, which in turn are expected
>> -   to be what lstat(2) and creat(2) accepts.  There is no such
>> -   thing as pathname encoding translation.
>> +Git is to some extent character encoding agnostic.
> 
> I do not think the removal of the text makes much sense here unless
> you add the equivalent to the new text below.
> 
>>   - The contents of the blob objects are uninterpreted sequences
>> of bytes.  There is no encoding translation at the core
>> level.
>>  
>> - - The commit log messages are uninterpreted sequences of non-NUL
>> -   bytes.
>> + - Pathnames are encoded in UTF-8 normalization form C. This
> 
> That is true only on some systems like OSX (with HFS+) and Windows,
> no?  BSDs in general and Linux do not do any such mangling IIRC.

Modern Unices don't need any such mangling because UTF-8 NFC should
be the default system encoding. I'm not sure for BSDs, but it has
been the default on all major Linux distros for more than 10 years.

> I
> am OK with mangling described as a notable oddball to warn users,
> though; i.e. not as a norm as your new text suggests but as an
> exception.
> 

I would guess that non-UTF-8 Unices (or file systems) are the oddball
case, which is why I described them last. But I could be wrong.

>> +   platforms. If file system APIs don't use UTF-8 (which may be
>> +   file system specific), it is recommended to stick to pure
>> +   ASCII file names.
> 
> Hmph, who endorsed such a recommendation?  It is recommended to
> stick to whatever naming scheme that would not cause troubles to
> project participants.  If your participants all want to (and can)
> use ISO-8859-1, we do not discourage them from doing so.
> 

ISO-8859-x file names may be fine if you won't ever need to:
- use git-web, JGit, gitk, git-gui...
- exchange repos with "normal" (UTF-8) Unices, Mac and Windows systems
- publish your work on a git hosting service (and expect file and
  ref names to show up correctly in the web interface)
- store the repo on Unicode-based file systems (JFS, Joliet, UDF,
  exFat, NTFS, HFS, CIFS...)

These restrictions are not that obvious when you start a new git
project, and while converting file names after the fact is possible
(e.g. using the recodetree script we shipped with Git for Windows
1.7.10), it will destroy history.

Thus I think we should strongly discourage users from using anything
but UTF-8.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Makefile / racy-git.txt: clarify USE_NSEC prerequisites

2015-06-13 Thread Karsten Blees
Signed-off-by: Karsten Blees 
---
Enabling nanosecond file times was recently discussed on the libgit2 project, so
I thought its time to fix the nanosecond issue on Linux. Don't know yet if the
patch will be accepted (and in which kernel version).

Considering that nanosecond file times are still broken for some file systems, 
it
may be desirable to make this a config option in addition to the compile-time
setting? I.e. only use sub-second file times for up-to-date checks if the config
option is enabled, so that it can be turned off on file systems with flaky
timestamps.

 Documentation/technical/racy-git.txt | 8 ++--
 Makefile | 9 +
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/racy-git.txt 
b/Documentation/technical/racy-git.txt
index 242a044..89ca173 100644
--- a/Documentation/technical/racy-git.txt
+++ b/Documentation/technical/racy-git.txt
@@ -42,12 +42,16 @@ compared, but this is not enabled by default because this 
member
 is not stable on network filesystems.  With `USE_NSEC`
 compile-time option, `st_mtim.tv_nsec` and `st_ctim.tv_nsec`
 members are also compared, but this is not enabled by default
-because in-core timestamps can have finer granularity than
+because on Linux, in-core timestamps can have finer granularity than
 on-disk timestamps, resulting in meaningless changes when an
 inode is evicted from the inode cache.  See commit 8ce13b0
 of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
 ([PATCH] Sync in core time granularity with filesystems,
-2005-01-04).
+2005-01-04). This patch is included in kernel 2.6.11 and newer, but
+only fixes the issue for file systems with exactly 1 ns or 1 s
+resolution. Other file systems are still broken in current Linux
+kernels (e.g. CEPH, CIFS, NTFS, UDF), see
+https://lkml.org/lkml/2015/6/9/714
 
 Racy Git
 
diff --git a/Makefile b/Makefile
index 54ec511..46d181a 100644
--- a/Makefile
+++ b/Makefile
@@ -217,10 +217,11 @@ all::
 # as the compiler can crash (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49299)
 #
 # Define USE_NSEC below if you want git to care about sub-second file mtimes
-# and ctimes. Note that you need recent glibc (at least 2.2.4) for this, and
-# it will BREAK YOUR LOCAL DIFFS! show-diff and anything using it will likely
-# randomly break unless your underlying filesystem supports those sub-second
-# times (my ext3 doesn't).
+# and ctimes. Note that you need recent glibc (at least 2.2.4) for this. On
+# Linux, kernel 2.6.11 or newer is required for reliable sub-second file times
+# on file systems with exactly 1 ns or 1 s resolution. If you intend to use Git
+# on other file systems (e.g. CEPH, CIFS, NTFS, UDF), don't enable USE_NSEC. 
See
+# Documentation/technical/racy-git.txt for details.
 #
 # Define USE_ST_TIMESPEC if your "struct stat" uses "st_ctimespec" instead of
 # "st_ctim"
-- 
2.4.3.windows.1.1.g87477f9

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Documentation/i18n.txt: clarify character encoding support

2015-06-13 Thread Karsten Blees
As a "distributed" VCS, git should better define the encodings of its core
textual data structures, in particular those that are part of the network
protocol.

That git is encoding agnostic is only really true for blob objects. E.g.
the 'non-NUL bytes' requirement of tree and commit objects excludes
UTF-16/32, and the special meaning of '/' in the index file as well as
space and linefeed in commit objects eliminates EBCDIC and other non-ASCII
encodings.

Git expects bytes < 0x80 to be pure ASCII, thus CJK encodings that partly
overlap with the ASCII range are problematic as well. E.g. fmt_ident()
removes trailing 0x5C from user names on the assumption that it is ASCII
'\'. However, there are over 200 GBK double byte codes that end in 0x5C.

UTF-8 as default encoding on Linux and respective path translations in the
Mac and Windows versions have established UTF-8 NFC as de-facto standard
for path names.

Update the documentation in i18n.txt to reflect the current status-quo.

Signed-off-by: Karsten Blees 
---
 Documentation/i18n.txt | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/Documentation/i18n.txt b/Documentation/i18n.txt
index e9a1d5d..e5f6233 100644
--- a/Documentation/i18n.txt
+++ b/Documentation/i18n.txt
@@ -1,18 +1,28 @@
-At the core level, Git is character encoding agnostic.
-
- - The pathnames recorded in the index and in the tree objects
-   are treated as uninterpreted sequences of non-NUL bytes.
-   What readdir(2) returns are what are recorded and compared
-   with the data Git keeps track of, which in turn are expected
-   to be what lstat(2) and creat(2) accepts.  There is no such
-   thing as pathname encoding translation.
+Git is to some extent character encoding agnostic.
 
  - The contents of the blob objects are uninterpreted sequences
of bytes.  There is no encoding translation at the core
level.
 
- - The commit log messages are uninterpreted sequences of non-NUL
-   bytes.
+ - Pathnames are encoded in UTF-8 normalization form C. This
+   applies to tree objects, the index file, ref names and
+   config files (`.git/config` (see linkgit:git-config[1]),
+   linkgit:gitignore[5], linkgit:gitattributes[5] and
+   linkgit:gitmodules[5]).
+   The Mac and Windows versions automatically translate pathnames
+   to and from UTF-8 NFC in their readdir(2), lstat(2), creat(2)
+   etc. APIs. However, there is no such translation on other
+   platforms. If file system APIs don't use UTF-8 (which may be
+   file system specific), it is recommended to stick to pure
+   ASCII file names. While Git technically supports other
+   extended ASCII encodings at the core level, such repositories
+   will not be portable.
+
+ - Commit log messages are typically encoded in UTF-8, but other
+   extended ASCII encodings are also supported. This includes
+   ISO-8859-x, CP125x and many others, but _not_ UTF-16/32,
+   EBCDIC and CJK multi-byte encodings (GBK, Shift-JIS, Big5,
+   EUC-x, CP9xx etc.).
 
 Although we encourage that the commit log messages are encoded
 in UTF-8, both the core and Git Porcelain are designed not to
-- 
2.4.1.windows.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH (resend)] git-gui: make gc warning threshold match 'git gc --auto'

2015-06-13 Thread Karsten Blees
Date: Wed, 6 Aug 2014 20:43:46 +0200

The number of loose objects at which git-gui shows a gc warning has
historically been hardcoded to ~2000, or ~200 on Windows. The warning can
only be disabled completely via gui.gcwarning=false.

Especially on Windows, the hardcoded threshold is so ridiculously low that
git-gui often complains even immediately after gc (due to loose objects
only referenced by the reflog).

'git gc --auto' uses a much bigger threshold to check if gc is necessary.
Additionally, the value can be configured via gc.auto (default 6700).
There's no special case for Windows.

Change git-gui so that it only warns if 'git gc --auto' would also do an
automatic gc, i.e.:
 - calculate the threshold from the gc.auto setting (default 6700,
   disabled if <= 0)
 - check directory .git/objects/17

We still check four directories (14-17) if gc.auto is very small, to get a
better estimate.

Signed-off-by: Karsten Blees 
---
 git-gui/lib/database.tcl | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/git-gui/lib/database.tcl b/git-gui/lib/database.tcl
index 1f187ed..212b195 100644
--- a/git-gui/lib/database.tcl
+++ b/git-gui/lib/database.tcl
@@ -89,19 +89,26 @@ proc do_fsck_objects {} {
 }
 
 proc hint_gc {} {
+   global repo_config
+   set auto_gc $repo_config(gc.auto)
+   if {$auto_gc eq {}} {
+   set auto_gc 6700
+   } elseif {$auto_gc <= 0} {
+   return
+   }
+
set ndirs 1
-   set limit 8
-   if {[is_Windows]} {
+   set limit [expr {($auto_gc + 255) / 256}]
+   if {$limit < 4} {
set ndirs 4
-   set limit 1
}
 
set count [llength [glob \
-nocomplain \
-- \
-   [gitdir objects 4\[0-[expr {$ndirs-1}]\]/*]]]
+   [gitdir objects 1\[[expr {8-$ndirs}]-7\]/*]]]
 
-   if {$count >= $limit * $ndirs} {
+   if {$count > $limit * $ndirs} {
set objects_current [expr {$count * 256/$ndirs}]
if {[ask_popup \
[mc "This repository currently has approximately %i 
loose objects.
-- 
2.4.1.windows.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-new-workdir: add windows compatibility

2015-05-26 Thread Karsten Blees
Am 26.05.2015 um 06:03 schrieb Junio C Hamano:
> Daniel Smith  writes:
> 
>> When running on Windows in MinGW, creating symbolic links via ln always
>> failed.
>>
>> Using mklink instead of ln is the recommended method of creating links on
>> Windows:
>> http://stackoverflow.com/questions/18641864/git-bash-shell-fails-to-create-symbolic-links
>>
> 
> I'll defer to Windows folks if "mklink" is a sensible thing to use
> or not; I have no first-hand experience with Windows, but only heard
> that links are for admin user only or something like that, so I want
> to hear from people whose judgement on Windows matters I trust.
> 

mklink:
 - is not available on Windows XP
 - requires special permissions
 - does not work on network shares (unless enabled via 'fsutil behavior')
 - only works on file systems that support reparse points (e.g. NTFS, not FAT)

AFAICT, the MSys2 symlink() implementation is pretty smart to detect these 
conditions and fall back to deep copy (aka 'cp -a') if symlinks are not 
supported.

IOW, using 'ln -s' will hopefully "just work" in the upcoming Git for Windows 
2, thus trying to fix it for MSys1 / Git for Windows 1.9x is probably a lost 
cause.

Karsten

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/4] UTF8 BOM follow-up

2015-04-17 Thread Karsten Blees
Am 16.04.2015 um 20:39 schrieb Junio C Hamano:
> This is on top of the ".gitignore can start with UTF8 BOM" patch
> from Carlos.
> 
> Second try; the first patch is new to clarify the logic in the
> codeflow after Carlos's patch, and the second one has been adjusted
> accordingly.
> 
> Junio C Hamano (4):
>   add_excludes_from_file: clarify the bom skipping logic
>   utf8-bom: introduce skip_utf8_bom() helper
>   config: use utf8_bom[] from utf.[ch] in git_parse_source()
>   attr: skip UTF8 BOM at the beginning of the input file
> 


Wouldn't it be better to just strip the BOM on commit, e.g. via a clean filter 
or pre-commit hook (as suggested in [1])? Or is this patch series only meant to 
supplement such a solution (i.e. only strip the BOM when reading files from the 
working-copy rather than the committed tree)?


According to rfc3629 chapter 6 [2], the use of a BOM as encoding signature 
should be forbidden if the encoding is *known* to be always UTF-8. And 
.gitignore, .gitattributes and .gitmodules contain path names, which are always 
UTF-8 as of Git for Windows v1.7.10.

IOW, allowing a BOM would mean that files *without* BOM are *not* UTF-8 and 
need to be decoded from e.g. system encoding (which unfortunately cannot be set 
to UTF-8 on Windows). But this makes no sense as the repository would not be 
portable. E.g. a .gitattributes file created on a Greek Windows, containing 
greek path names in Cp1253, would not work on platforms with different encoding.

On the other hand, just ignoring the BOM (as this patch series does) leaves us 
with two alternative binary representations of the same content file...i.e. 
we'll eventually end up with spurious 1st line changes as users add / remove 
BOMs from committed .git[ignore|attributes|modules] files, depending on their 
editor preference...


For local files (.gitconfig, .git/info/exclude, .git/COMMIT_EDITMSG...), 
auto-detecting encoding based on the presence of a BOM makes somewhat more 
sense. However, this will most likely break editors that follow the 
recommendation of the Unicode specification ("Use of a BOM is neither required 
nor recommended for UTF-8" [3]). So we'd probably need a core.editorEncoding or 
core.editorUseBom setting to tell git whether "no BOM" means UTF-8 or system 
encoding...

Just as a reminder: we should update the Git for Windows Unicode document [4] 
if we improve support for BOM-adamant editors.

Cheers,
Karsten

[1] 
http://stackoverflow.com/questions/27223985/git-ignore-bom-prevent-git-diff-from-showing-byte-order-mark-changes
[2] https://tools.ietf.org/html/rfc3629
[3] http://www.unicode.org/versions/Unicode7.0.0/ch02.pdf  p.40
[4] 
https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support#editor


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Win32: nanosecond-precision file times

2015-02-17 Thread Karsten Blees
Am 16.02.2015 um 23:10 schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> However, the Makefile has this to say on the subject:
>>
>> # Define USE_NSEC below if you want git to care about sub-second file mtimes
>> # and ctimes. Note that you need recent glibc (at least 2.2.4) for this, and
>> # it will BREAK YOUR LOCAL DIFFS! show-diff and anything using it will likely
>> # randomly break unless your underlying filesystem supports those sub-second
>> # times (my ext3 doesn't).
>>
>> Am I missing something?
> 
[...]
> 
> If you use NSEC, however, and "refresh" grabbed a subsecond time and
> then later "diff-files" learned a truncated/rounded time because the
> filesystem later purged the cached inodes and re-read it from the
> underlying filesystem with no subsecond time resolution, the times
> would not match so you will again see "diff-files" report that "foo"
> is now different.
> 
> That is what the comment you cited is about.
> 

OK, so it all boils down to the "inode cache doesn't round to on-disk
resolution" issue after all, as explained in racy-git.txt.

But then the Makefile comment is quite misleading. Enabling USE_NSEC
will not unconditionally "BREAK YOUR LOCAL DIFFS". Show-diff / diff-files
will also not "break", but may report false positives instead (which may
be worse than failing hard...).

It also seems to me that this is a Linux-only issue which is only remotely
related to the USE_NSEC setting or file systems' timestamp resolutions.

The kernel patch referenced in racy-git.txt only addresses sub-second
resolutions. So even if USE_NSEC is *disabled*, the diff-files issue will
bite you on e.g. FAT32-formatted flash-drives on Linux, at least on
re-mount ("sync && echo 2>/proc/sys/vm/drop_caches" didn't seem to trigger
the rounding, though).

I also suspect that the sub-second rounding function of that patch
(timespec_trunc()) takes some invalid shortcuts - if you configure the
kernel for 300 jiffies per second (i.e. 3,333,333 ns per tick), UDF, NTFS,
SMBFS and CIFS file times will most likely not be properly rounded in the
inode cache. Haven't tested this, though.

So the only file systems with reliable file times on Linux seem to be
those with exactly 1s or 1ns resolution...?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Win32: nanosecond-precision file times

2015-02-16 Thread Karsten Blees
Am 13.02.2015 um 20:28 schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> Am 13.02.2015 um 00:38 schrieb Junio C Hamano:
>>>
>>> We do have sec/nsec fields in cache_time structure, so I have
>>> nothing against updating the msysGit port to fill that value.
> 
> Having said that, we do not enable the NSEC stuff by default on Unix
> for a reason.  I'd expect those who know Windows filesystems well to
> pick the default there wisely ;-)
> 

Now I'm a bit confused about the discrepancy between racy-git.txt and
the Makefile.

Racy-git.txt explains that the nsec-part may be dropped when an inode
is flushed to disk if the file system doesn't support nsec resolution.
This was supposedly an issue with the Linux kernel fixed back in 2005.

In my understanding, this means that git would see the file as
changed and re-check the content (i.e. it will hurt performance).

IOW: Git may be slow if the file system cache has better file time
resolution than the on-disk file system representation.


However, the Makefile has this to say on the subject:

# Define USE_NSEC below if you want git to care about sub-second file mtimes
# and ctimes. Note that you need recent glibc (at least 2.2.4) for this, and
# it will BREAK YOUR LOCAL DIFFS! show-diff and anything using it will likely
# randomly break unless your underlying filesystem supports those sub-second
# times (my ext3 doesn't).

Am I missing something? Is there anything in Git that will actually
"break" with USE_NSEC if the OS / file system doesn't support it
(rather than just being slow)?

History:
* The Makefile comment was added in 2005 (bdd4da59), along with a
  comment in read-cache.c explaining the issue (i.e. flushing to disk
  will clear the nsec field).
* The comment in read-cache.c was removed in 2008 (7a51ed66),
  seemingly dropping USE_NSEC support entirely.
* USE_NSEC support was re-added (without the read-cache.c comment) in
  2009 (fba2f38a).


Regarding the Windows situation: I've just verified (on my Win7 x64
box) that file times obtained through a variety of APIs (GetFileTime,
GetFileAttributesEx, GetFileInformationByHandle, FindFirstFile) are
consistent and properly rounded to the file system's resolution (e.g.
10ms / 2s for FAT). This is even if the file is still open and I try
to SetFileTime() to unrounded values.

So I think enabling USE_NSEC should be fine on Windows.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Win32: nanosecond-precision file times

2015-02-12 Thread Karsten Blees
Am 13.02.2015 um 00:38 schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> This is more about copying 'old' things around, which usually also
>> copies mtime on Windows. E.g.:
>>
>>   # create two files with slightly different mtime
>>   for i in {1..10}; do (echo "v1" >> test); done &&
>>   for i in {1..10}; do (echo "v2" >> test2); done
>>   # wait a bit so that '.git/index' is always newer than 'test' / 'test2'
>>   sleep 1
>>   git add test
>>   git commit -m v1
>>   # copy test2 over test (similar to 'cp -p', but native 'copy' also
>>   # copies mtime nanoseconds)
>>   cmd //c "copy /y test2 test"
>>   git add test
>>   git commit -m v2
>>
>> Without these patches, git does not detect the change, and the second
>> git add / git commit are noops.
> 
> We do have sec/nsec fields in cache_time structure, so I have
> nothing against updating the msysGit port to fill that value.
> 
> I was and am just reacting to the fact that this is sold as if it
> "fixes" something.

Sorry, that must have been a misunderstanding. This series does
NOT fix the problem with VSS2Git, nor any other tool that abuses
mtime for the author's birthday or whatever.

The issue that two files may accidentally have the same size and
mtime was just brought up in this discussion.

> It doesn't fundamentally change the fact that
> mtime that does not follow the semantics Dscho mentioned in his
> earlier message does not work well with Git.
> 
> Having said that, even with such a patch, as long as the system is
> sufficiently fast, test and test2 will have nonoseconds identical
> timestamp and you would have the same issue, no?
> 

Right. Where "sufficiently fast" would mean opening and closing a
file ten times in less than 100ns...on Windows... ;-)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Win32: implement nanosecond-precision file times

2015-02-12 Thread Karsten Blees
Am 13.02.2015 um 00:15 schrieb Thomas Braun:
> Am 12.02.2015 um 00:53 schrieb Karsten Blees:
>> diff --git a/config.mak.uname b/config.mak.uname
>> index b64b63c..a18a4cc 100644
>> --- a/config.mak.uname
>> +++ b/config.mak.uname
>> @@ -346,7 +346,7 @@ ifeq ($(uname_S),Windows)
>>  NO_SVN_TESTS = YesPlease
>>  RUNTIME_PREFIX = YesPlease
>>  NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease
>> -NO_NSEC = YesPlease
>> +USE_NSEC = YesPlease
>>  USE_WIN32_MMAP = YesPlease
>>  # USE_NED_ALLOCATOR = YesPlease
>>  UNRELIABLE_FSTAT = UnfortunatelyYes
>> @@ -498,7 +498,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
>>  NO_PERL_MAKEMAKER = YesPlease
>>  RUNTIME_PREFIX = YesPlease
>>  NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease
>> -NO_NSEC = YesPlease
>> +USE_NSEC = YesPlease
>>  USE_WIN32_MMAP = YesPlease
>>  USE_NED_ALLOCATOR = YesPlease
>>  UNRELIABLE_FSTAT = UnfortunatelyYes
>>
> 
> Why not also enable it in our special msysgit section?
> 
> diff --git a/config.mak.uname b/config.mak.uname
> index b64b63c..6326794 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -535,6 +535,7 @@ ifneq (,$(wildcard ../THIS_IS_MSYSGIT))
> INTERNAL_QSORT = YesPlease
> HAVE_LIBCHARSET_H = YesPlease
> NO_GETTEXT = YesPlease
> +   USE_NSEC = YesPlease
>  else
> NO_CURL = YesPlease
>  endif
> 

The msysgit section is within MINGW (i.e. already covered by the 2nd hunk), 
don't let the indentation fool you.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Win32: nanosecond-precision file times

2015-02-12 Thread Karsten Blees
Am 12.02.2015 um 20:48 schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> However, some users have expressed concerns that 'same size and
>> mtime' [2] may theoretically happen by chance in daily operation.
> 
> Hmph.
> 
> Haven't we already accepted that it is not just "may theoretically
> happen" and had counter-measures in racy-git detection machinery
> for quite some time?
> 

Racy-git only triggers for files that are modified at the same time
as .git/index (i.e. we don't know if the stat cache is up to date).

This is more about copying 'old' things around, which usually also
copies mtime on Windows. E.g.:

  # create two files with slightly different mtime
  for i in {1..10}; do (echo "v1" >> test); done &&
  for i in {1..10}; do (echo "v2" >> test2); done
  # wait a bit so that '.git/index' is always newer than 'test' / 'test2'
  sleep 1
  git add test
  git commit -m v1
  # copy test2 over test (similar to 'cp -p', but native 'copy' also
  # copies mtime nanoseconds)
  cmd //c "copy /y test2 test"
  git add test
  git commit -m v2

Without these patches, git does not detect the change, and the second
git add / git commit are noops.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] Win32: implement nanosecond-precision file times

2015-02-11 Thread Karsten Blees
We no longer use any of MSVCRT's stat-functions, so there's no need to
stick to a CRT-compatible 'struct stat' either.

Define and use our own POSIX-2013-compatible 'struct stat' with nanosecond-
precision file times.

Signed-off-by: Karsten Blees 
---
 compat/mingw.c   | 12 ++--
 compat/mingw.h   | 43 +++
 config.mak.uname |  4 ++--
 3 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 6d73a3d..e4d5e3f 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -442,9 +442,9 @@ static int do_lstat(int follow, const char *file_name, 
struct stat *buf)
buf->st_size = fdata.nFileSizeLow |
(((off_t)fdata.nFileSizeHigh)<<32);
buf->st_dev = buf->st_rdev = 0; /* not used by Git */
-   buf->st_atime = filetime_to_time_t(&(fdata.ftLastAccessTime));
-   buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
-   buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
+   filetime_to_timespec(&(fdata.ftLastAccessTime), 
&(buf->st_atim));
+   filetime_to_timespec(&(fdata.ftLastWriteTime), &(buf->st_mtim));
+   filetime_to_timespec(&(fdata.ftCreationTime), &(buf->st_ctim));
if (fdata.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) {
WIN32_FIND_DATAW findbuf;
HANDLE handle = FindFirstFileW(wfilename, &findbuf);
@@ -550,9 +550,9 @@ int mingw_fstat(int fd, struct stat *buf)
buf->st_mode = file_attr_to_st_mode(fdata.dwFileAttributes);
buf->st_size = fdata.nFileSizeLow |
(((off_t)fdata.nFileSizeHigh)<<32);
-   buf->st_atime = filetime_to_time_t(&(fdata.ftLastAccessTime));
-   buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
-   buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
+   filetime_to_timespec(&(fdata.ftLastAccessTime), 
&(buf->st_atim));
+   filetime_to_timespec(&(fdata.ftLastWriteTime), &(buf->st_mtim));
+   filetime_to_timespec(&(fdata.ftCreationTime), &(buf->st_ctim));
return 0;
 
case FILE_TYPE_CHAR:
diff --git a/compat/mingw.h b/compat/mingw.h
index f2a78b4..8dee9c9 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -293,22 +293,48 @@ static inline long long filetime_to_hnsec(const FILETIME 
*ft)
return winTime - 1164447360LL;
 }
 
-static inline time_t filetime_to_time_t(const FILETIME *ft)
+struct timespec {
+   time_t tv_sec;
+   long tv_nsec;
+};
+
+static inline void filetime_to_timespec(const FILETIME *ft, struct timespec 
*ts)
 {
-   return (time_t)(filetime_to_hnsec(ft) / 1000);
+   long long hnsec = filetime_to_hnsec(ft);
+   ts->tv_sec = (time_t)(hnsec / 1000);
+   ts->tv_nsec = (hnsec % 1000) * 100;
 }
 
 /*
- * Use mingw specific stat()/lstat()/fstat() implementations on Windows.
+ * Use mingw specific stat()/lstat()/fstat() implementations on Windows,
+ * including our own struct stat with 64 bit st_size and nanosecond-precision
+ * file times.
  */
 #define off_t off64_t
 #define lseek _lseeki64
 
-/* use struct stat with 64 bit st_size */
+struct mingw_stat {
+_dev_t st_dev;
+_ino_t st_ino;
+_mode_t st_mode;
+short st_nlink;
+short st_uid;
+short st_gid;
+_dev_t st_rdev;
+off64_t st_size;
+struct timespec st_atim;
+struct timespec st_mtim;
+struct timespec st_ctim;
+};
+
+#define st_atime st_atim.tv_sec
+#define st_mtime st_mtim.tv_sec
+#define st_ctime st_ctim.tv_sec
+
 #ifdef stat
 #undef stat
 #endif
-#define stat _stati64
+#define stat mingw_stat
 int mingw_lstat(const char *file_name, struct stat *buf);
 int mingw_stat(const char *file_name, struct stat *buf);
 int mingw_fstat(int fd, struct stat *buf);
@@ -321,13 +347,6 @@ int mingw_fstat(int fd, struct stat *buf);
 #endif
 #define lstat mingw_lstat
 
-#ifndef _stati64
-# define _stati64(x,y) mingw_stat(x,y)
-#elif defined (_USE_32BIT_TIME_T)
-# define _stat32i64(x,y) mingw_stat(x,y)
-#else
-# define _stat64(x,y) mingw_stat(x,y)
-#endif
 
 int mingw_utime(const char *file_name, const struct utimbuf *times);
 #define utime mingw_utime
diff --git a/config.mak.uname b/config.mak.uname
index b64b63c..a18a4cc 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -346,7 +346,7 @@ ifeq ($(uname_S),Windows)
NO_SVN_TESTS = YesPlease
RUNTIME_PREFIX = YesPlease
NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease
-   NO_NSEC = YesPlease
+   USE_NSEC = YesPlease
USE_WIN32_MMAP = YesPlease
# USE_NED_ALLOCATOR = YesPlease
UNRELIABLE_FSTAT = UnfortunatelyYes
@@ -498,7 +498,7 

[PATCH 2/3] Win32: replace MSVCRT's fstat() with a Win32-based implementation

2015-02-11 Thread Karsten Blees
fstat() is the only stat-related CRT function for which we don't have a
full replacement yet (and thus the only reason to stick with MSVCRT's
'struct stat' definition).

Fully implement fstat(), in preparation of implementing a POSIX 2013
compatible 'struct stat' with nanosecond-precision file times.

Signed-off-by: Karsten Blees 
---
 compat/mingw.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index ba3cfb0..6d73a3d 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -532,28 +532,38 @@ int mingw_fstat(int fd, struct stat *buf)
 {
HANDLE fh = (HANDLE)_get_osfhandle(fd);
BY_HANDLE_FILE_INFORMATION fdata;
+   DWORD avail;
 
if (fh == INVALID_HANDLE_VALUE) {
errno = EBADF;
return -1;
}
-   /* direct non-file handles to MS's fstat() */
-   if (GetFileType(fh) != FILE_TYPE_DISK)
-   return _fstati64(fd, buf);
 
-   if (GetFileInformationByHandle(fh, &fdata)) {
-   buf->st_ino = 0;
-   buf->st_gid = 0;
-   buf->st_uid = 0;
-   buf->st_nlink = 1;
+   /* initialize stat fields */
+   memset(buf, 0, sizeof(*buf));
+   buf->st_nlink = 1;
+
+   switch (GetFileType(fh) & ~FILE_TYPE_REMOTE) {
+   case FILE_TYPE_DISK:
+   if (!GetFileInformationByHandle(fh, &fdata))
+   break;
buf->st_mode = file_attr_to_st_mode(fdata.dwFileAttributes);
buf->st_size = fdata.nFileSizeLow |
(((off_t)fdata.nFileSizeHigh)<<32);
-   buf->st_dev = buf->st_rdev = 0; /* not used by Git */
buf->st_atime = filetime_to_time_t(&(fdata.ftLastAccessTime));
buf->st_mtime = filetime_to_time_t(&(fdata.ftLastWriteTime));
buf->st_ctime = filetime_to_time_t(&(fdata.ftCreationTime));
return 0;
+
+   case FILE_TYPE_CHAR:
+   buf->st_mode = _S_IFCHR;
+   return 0;
+
+   case FILE_TYPE_PIPE:
+   buf->st_mode = _S_IFIFO;
+   if (PeekNamedPipe(fh, NULL, 0, NULL, &avail, NULL))
+   buf->st_size = avail;
+   return 0;
}
errno = EBADF;
return -1;
-- 
2.3.0.3.ge7778af


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Win32: make FILETIME conversion functions public

2015-02-11 Thread Karsten Blees
Signed-off-by: Karsten Blees 
---
 compat/mingw.c | 16 
 compat/mingw.h | 16 
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 70f3191..ba3cfb0 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -419,22 +419,6 @@ int mingw_chmod(const char *filename, int mode)
return _wchmod(wfilename, mode);
 }
 
-/*
- * The unit of FILETIME is 100-nanoseconds since January 1, 1601, UTC.
- * Returns the 100-nanoseconds ("hekto nanoseconds") since the epoch.
- */
-static inline long long filetime_to_hnsec(const FILETIME *ft)
-{
-   long long winTime = ((long long)ft->dwHighDateTime << 32) + 
ft->dwLowDateTime;
-   /* Windows to Unix Epoch conversion */
-   return winTime - 1164447360LL;
-}
-
-static inline time_t filetime_to_time_t(const FILETIME *ft)
-{
-   return (time_t)(filetime_to_hnsec(ft) / 1000);
-}
-
 /* We keep the do_lstat code in a separate function to avoid recursion.
  * When a path ends with a slash, the stat will fail with ENOENT. In
  * this case, we strip the trailing slashes and stat again.
diff --git a/compat/mingw.h b/compat/mingw.h
index 5e499cf..f2a78b4 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -283,6 +283,22 @@ static inline int getrlimit(int resource, struct rlimit 
*rlp)
 }
 
 /*
+ * The unit of FILETIME is 100-nanoseconds since January 1, 1601, UTC.
+ * Returns the 100-nanoseconds ("hekto nanoseconds") since the epoch.
+ */
+static inline long long filetime_to_hnsec(const FILETIME *ft)
+{
+   long long winTime = ((long long)ft->dwHighDateTime << 32) + 
ft->dwLowDateTime;
+   /* Windows to Unix Epoch conversion */
+   return winTime - 1164447360LL;
+}
+
+static inline time_t filetime_to_time_t(const FILETIME *ft)
+{
+   return (time_t)(filetime_to_hnsec(ft) / 1000);
+}
+
+/*
  * Use mingw specific stat()/lstat()/fstat() implementations on Windows.
  */
 #define off_t off64_t
-- 
2.3.0.3.ge7778af


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Win32: nanosecond-precision file times

2015-02-11 Thread Karsten Blees
This patch series was inspired by the problem that Git does not
detect changed file content if st_size, st_mtime and st_ctime
are unchanged. This was apparently caused by VSS2Git resetting
mtime to a value in the past. [1]

I believe (or rather hope) that all involved in the discussion
agree that Git cannot reasonably be expected to detect changed
file content if file time(s) are reset on purpose.

However, some users have expressed concerns that 'same size and
mtime' [2] may theoretically happen by chance in daily operation.

This patch series adopts POSIX 2013 'struct timespec' file times
to make this practically impossible, at least on NTFS with 100ns
file time resolution.

Cheers,
Karsten

[1] https://github.com/msysgit/git/issues/312
[2] Note that st_ctime of a file never changes on Windows, as it
means 'creation time' rather than 'change status time'.

Karsten Blees (3):
  Win32: make FILETIME conversion functions public
  Win32: replace MSVCRT's fstat() with a Win32-based implementation
  Win32: implement nanosecond-precision file times

 compat/mingw.c   | 56 +---
 compat/mingw.h   | 55 +--
 config.mak.uname |  4 ++--
 3 files changed, 72 insertions(+), 43 deletions(-)

-- 
2.3.0.3.ge7778af

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/9] autoconf: Check for timer_settime

2014-09-10 Thread Karsten Blees
Am 29.08.2014 19:40, schrieb Keller, Jacob E:
> On Fri, 2014-08-29 at 19:26 +0200, Johannes Sixt wrote:
>> Am 29.08.2014 18:42, schrieb Jacob Keller:
>>> From: Jonas 'Sortie' Termansen 
>>>
>>> This function will be used in a following commit.
>>>
>>> The timer_settime function is provided in librt on some systems. We
>>> already use this library sometimes to get clock_gettime, so rework the
>>> logic so we don't link with it twice.
>>>
>>> This function was not previously used by git. This can cause trouble for
>>> people on systems without timer_settime if they only rely on
>>> config.mak.uname. They will need to set NO_TIMER_SETTIME manually.
>>>
>>> Add proper replacement function macros for setitimer and timer_settime
>>> that implement timer_settime as a wrapper for setitimer. In this way, if
>>> the system has setitimer but not timer_settime then we will be able to
>>> call timer_create, timer_settime, and timer_delete correctly and it will
>>> wrap to setitimer under the hood. This will be used in the following
>>> commit.
>>>
>>> Signed-off-by: Jonas 'Sortie' Termansen 
>>> Signed-off-by: Jacob Keller 
>>> ---
>>>  Makefile  | 21 +
>>>  config.mak.uname  |  3 +++
>>>  configure.ac  |  8 
>>>  git-compat-util.h | 20 +++-
>>>  4 files changed, 51 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Makefile b/Makefile
>>> index 66329e4b372b..5337ef0b7cd6 100644
>>> --- a/Makefile
>>> +++ b/Makefile
>>> @@ -182,16 +182,22 @@ all::
>>>  #
>>>  # Define NO_SETITIMER if you don't have setitimer()
>>>  #
>>> +# Define NO_TIMER_SETTIME if you don't have timer_settime()
>>> +#
>>>  # Define NO_TIMER_T if you don't have timer_t.
>>> +# This also implies NO_TIMER_SETTIME
>>>  #
>>>  # Define NO_STRUCT_TIMESPEC if you don't have struct timespec
>>> +# This also implies NO_TIMER_SETTIME
>>>  #
>>>  # Define NO_STRUCT_SIGEVENT if you don't have struct sigevent
>>> +# This also implies NO_TIMER_SETTIME
>>>  #
>>>  # Define NO_STRUCT_ITIMERVAL if you don't have struct itimerval
>>>  # This also implies NO_SETITIMER
>>>  #
>>>  # Define NO_STRUCT_ITIMERSPEC if you don't have struct itimerspec
>>> +# This also implies NO_TIMER_SETTIME
>>>  #
>>>  # Define NO_FAST_WORKING_DIRECTORY if accessing objects in pack files is
>>>  # generally faster on your platform than accessing the working directory.
>>> @@ -1348,12 +1354,15 @@ ifdef OBJECT_CREATION_USES_RENAMES
>>>  endif
>>>  ifdef NO_TIMER_T
>>> COMPAT_CFLAGS += -DNO_TIMER_T
>>> +   NO_TIMER_SETTIME = YesPlease
>>>  endif
>>>  ifdef NO_STRUCT_TIMESPEC
>>> COMPAT_CFLAGS += -DNO_STRUCT_TIMESPEC
>>> +   NO_TIMER_SETTIME = YesPlease
>>>  endif
>>>  ifdef NO_STRUCT_SIGEVENT
>>> COMPAT_CFLAGS += -DNO_STRUCT_SIGEVENT
>>> +   NO_TIMER_SETTIME = YesPlease
>>>  endif
>>>  ifdef NO_STRUCT_ITIMERVAL
>>> COMPAT_CFLAGS += -DNO_STRUCT_ITIMERVAL
>>> @@ -1361,10 +1370,14 @@ ifdef NO_STRUCT_ITIMERVAL
>>>  endif
>>>  ifdef NO_STRUCT_ITIMERSPEC
>>> COMPAT_CFLAGS += -DNO_STRUCT_ITIMERSPEC
>>> +   NO_TIMER_SETTIME = YesPlease
>>>  endif
>>>  ifdef NO_SETITIMER
>>> COMPAT_CFLAGS += -DNO_SETITIMER
>>>  endif
>>> +ifdef NO_TIMER_SETTIME
>>> +   COMPAT_CFLAGS += -DNO_TIMER_SETTIME
>>> +endif
>>>  ifdef NO_PREAD
>>> COMPAT_CFLAGS += -DNO_PREAD
>>> COMPAT_OBJS += compat/pread.o
>>> @@ -1524,6 +1537,14 @@ endif
>>>  
>>>  ifdef HAVE_CLOCK_GETTIME
>>> BASIC_CFLAGS += -DHAVE_CLOCK_GETTIME
>>> +   LINK_WITH_LIBRT = YesPlease
>>> +endif
>>> +
>>> +ifndef NO_TIMER_SETTIME
>>> +   LINK_WITH_LIBRT = YesPlease
>>> +endif
>>> +
>>> +ifdef LINK_WITH_LIBRT
>>> EXTLIBS += -lrt
>>>  endif
>>>  
>>> diff --git a/config.mak.uname b/config.mak.uname
>>> index f0d93ef868a7..d04deab2dfa8 100644
>>> --- a/config.mak.uname
>>> +++ b/config.mak.uname
>>> @@ -99,6 +99,7 @@ ifeq ($(uname_S),Darwin)
>>> USE_ST_TIMESPEC = YesPlease
>>> HAVE_DEV_TTY = YesPlease
>>> NO_STRUCT_ITIMERSPEC = UnfortunatelyYes
>>> +   NO_TIMER_SETTIME = UnfortunatelyYes
>>> COMPAT_OBJS += compat/precompose_utf8.o
>>> BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
>>>  endif
>>> @@ -360,6 +361,7 @@ ifeq ($(uname_S),Windows)
>>> NO_STRUCT_TIMESPEC = UnfortunatelyYes
>>> NO_STRUCT_SIGEVENT = UnfortunatelyYes
>>> NO_STRUCT_ITIMERSPEC = UnfortunatelyYes
>>> +   NO_TIMER_SETTIME = UnfortunatelyYes
>>>  
>>> CC = compat/vcbuild/scripts/clink.pl
>>> AR = compat/vcbuild/scripts/lib.pl
>>> @@ -513,6 +515,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
>>> NO_STRUCT_TIMESPEC = UnfortunatelyYes
>>> NO_STRUCT_SIGEVENT = UnfortunatelyYes
>>> NO_STRUCT_ITIMERSPEC = UnfortunatelyYes
>>> +   NO_TIMER_SETTIME = UnfortunatelyYes
>>> COMPAT_CFLAGS += -D__USE_MINGW_ACCESS -D_USE_32BIT_TIME_T -DNOGDI 
>>> -Icompat -Icompat/win32
>>> COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
>>> COMPAT_OBJS += compat/mingw.o compat/winansi.o \
>>> diff --git a/configure.ac b/configure.ac
>>> index 954f9ddb03c2..9d6ec41acc82 100644
>>> --- 

Re: Location of git config on Windows

2014-08-17 Thread Karsten Blees
Am 18.08.2014 00:01, schrieb Erik Faye-Lund:
> On Sun, Aug 17, 2014 at 10:18 PM, Daniel Corbe  wrote:
>>
>> I installed git on my Windows machine while it was connected to my
>> corporate network.  It picked up on that fact and used a mapped drive to
>> store its configuration file.
>>
>> As a result, I cannot currently use git when disconnected from my
>> network.  It throws the following error message: fatal: unable to access
>> 'Z:\/.config/git/config': Invalid argument
>>
>> Obviously this value is stored in the registry somewhere because I made
>> an attempt to uninstall and reinstall git with the same results.
>>
>> Can someone give me some guidance here?
> 
> Git looks for the per-user configuration in $HOME/.gitconfig, and if
> $HOME is not set, it falls back to $HOMEDIR/$HOMEPATH/.gitconfig. My
> guess would be some of these environment variables are incorrectly set
> on your system.

To be precise, git checks if %HOME% is set _and_ the directory exists before
falling back to %HOMEDRIVE%%HOMEPATH%.

If %HOMEDRIVE%%HOMEPATH% isn't set or the directory doesn't exist either, it
falls back to %USERPROFILE%, which is always local (C:/Users/), even
if disconnected from the network (at least that's how its supposed to be).


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git for Windows 1.9.4.msysgit.1

2014-08-15 Thread Karsten Blees
Am 15.08.2014 19:14, schrieb Thomas Braun:
> Hi,
> 
> the Git for Windows team just released the second maintenance release of
> the Windows-specific installers for git 1.9.4.
> 

Thank you so much!

> Regressions
> 
> git svn is/might be broken. Fixes welcome.


rebase -b 0x6400 bin/libsvn_repos-1-0.dll
rebase -b 0x6420 bin/libneon-25.dll


See https://github.com/msysgit/msysgit/pull/245

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-bitmap: do not use gcc packed attribute

2014-08-06 Thread Karsten Blees
Am 05.08.2014 20:47, schrieb Jeff King:
> On Mon, Aug 04, 2014 at 09:19:46PM +0200, Karsten Blees wrote:
> 
>> Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk
>> structure directly, without copying to a uchar[6] first.
> 
> Probably. My initial attempt was to keep together the read/write logic
> about which sizes each item is, but I think the result ended up
> unnecessarily verbose and harder to follow.
> 

Yeah, having read / write logic in different files is confusing, esp. when
not using structs to define the file format.

>> Here's what I came up with (just a sketch, commit message is lacky and the
>> helper functions deserve a better place / name):
> 
> I like it. Want to clean it up and submit in place of mine?
> 

Will do, but it will have to wait till next week.

> -Peff
> 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] git-gui: make gc warning threshold match 'git gc --auto'

2014-08-06 Thread Karsten Blees
The number of loose objects at which git-gui shows a gc warning has
historically been hardcoded to ~2000, or ~200 on Windows. The warning can
only be disabled completely via gui.gcwarning=false.

Especially on Windows, the hardcoded threshold is so ridiculously low that
git-gui often complains even immediately after gc (due to loose objects
only referenced by the reflog).

'git gc --auto' uses a much bigger threshold to check if gc is necessary.
Additionally, the value can be configured via gc.auto (default 6700).
There's no special case for Windows.

Change git-gui so that it only warns if 'git gc --auto' would also do an
automatic gc, i.e.:
 - calculate the threshold from the gc.auto setting (default 6700,
   disabled if <= 0)
 - check directory .git/objects/17

We still check four directories (14-17) if gc.auto is very small, to get a
better estimate.

Signed-off-by: Karsten Blees 
---
 git-gui/lib/database.tcl | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/git-gui/lib/database.tcl b/git-gui/lib/database.tcl
index 1f187ed..212b195 100644
--- a/git-gui/lib/database.tcl
+++ b/git-gui/lib/database.tcl
@@ -89,19 +89,26 @@ proc do_fsck_objects {} {
 }
 
 proc hint_gc {} {
+   global repo_config
+   set auto_gc $repo_config(gc.auto)
+   if {$auto_gc eq {}} {
+   set auto_gc 6700
+   } elseif {$auto_gc <= 0} {
+   return
+   }
+
set ndirs 1
-   set limit 8
-   if {[is_Windows]} {
+   set limit [expr {($auto_gc + 255) / 256}]
+   if {$limit < 4} {
set ndirs 4
-   set limit 1
}
 
set count [llength [glob \
-nocomplain \
-- \
-   [gitdir objects 4\[0-[expr {$ndirs-1}]\]/*]]]
+   [gitdir objects 1\[[expr {8-$ndirs}]-7\]/*]]]
 
-   if {$count >= $limit * $ndirs} {
+   if {$count > $limit * $ndirs} {
set objects_current [expr {$count * 256/$ndirs}]
if {[ask_popup \
[mc "This repository currently has approximately %i 
loose objects.
-- 
2.0.3.921.ga7e731a.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: struct hashmap_entry packing

2014-08-04 Thread Karsten Blees
Am 02.08.2014 00:37, schrieb Jeff King:
> On Tue, Jul 29, 2014 at 10:40:12PM +0200, Karsten Blees wrote:
> 
>>> The sizeof() has to be the same regardless of whether the hashmap_entry
>>> is standalone or in another struct, and therefore must be padded up to
>>> 16 bytes. If we stored "x" in that padding in the combined struct, it
>>> would be overwritten by our memset.
>>>
>>
>> The struct-packing patch was ultimately dropped because there was no way
>> to reliably make it work on all platforms. See [1] for discussion, [2] for
>> the final, 'most compatible' version.
> 
> Thanks for the pointers; I should have guessed there was more to it and
> searched the archive myself.
> 
>> Hmmm. Now that we have "__attribute__((packed))" in pack-bitmap.h, perhaps
>> we should do the same for stuct hashmap_entry? (Which was the original
>> proposal anyway...). Only works for GCC, but that should cover most builds
>> / platforms.
> 
> I don't see any reason to avoid the packed attribute, if it helps us. As
> you noted, anything using __attribute__ probably supports it, and if
> not, we can conditionally #define PACKED_STRUCT or something, like we do
> for NORETURN. Since it's purely an optimization, if another compiler
> doesn't use it, no big deal.
> 
> That being said, I don't know if those padding bytes are actually
> causing a measurable slowdown. It may not even be worth the trouble.
> 

Its not about performance (or correctness, in case of platforms that don't
support unaligned read), just about saving memory (e.g. mapping int to int
requires 24 bytes per entry, vs. 16 with packed structs).

The padding at the end of a structure is only needed for proper alignment in
arrays. Struct hashmap_entry is always used as prefix of some other structure,
never as an array, so there are no alignment issues here.

Typical memory layouts on 64-bit platforms are as follows (note that even in
the 'followed by int64' case, all members are properly aligned):


Unpacked struct followed by int32 - wastes 1/3 of memory:

  struct {
struct hashmap_entry {
00-07 struct hashmap_entry *next;
08-11 int hash;
12-15 // padding
} ent;
16-19   int32_t value;
20-23   // padding
  }


Packed struct followed by int32:

  struct {
struct hashmap_entry {
00-07 struct hashmap_entry *next;
08-11 int hash;
} ent;
12-15   int32_t value;
  }


Packed struct followed by int64:

  struct {
struct hashmap_entry {
00-07 struct hashmap_entry *next;
08-11 int hash;
} ent;
12-15   // padding
16-23   int64_t value;
  }


Array of packed struct (not used):

  struct hashmap_entry {
00-07   struct hashmap_entry *next;
08-11   int hash;
  }; // [0]
  struct hashmap_entry {
12-19   struct hashmap_entry *next; // !!!unaligned!!!
20-23   int hash;
  }; // [1]

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-bitmap: do not use gcc packed attribute

2014-08-04 Thread Karsten Blees
Am 02.08.2014 01:10, schrieb Jeff King:
> On Fri, Aug 01, 2014 at 06:37:39PM -0400, Jeff King wrote:
> 
>>> Btw.: Using struct-packing on 'struct bitmap_disk_entry' means that the
>>> binary format of .bitmap files is incompatible between GCC and other
>>> builds, correct?
>>
>> The on-disk format is defined by JGit; if there are differences between
>> the builds, that's a bug (and I would not be too surprised if there is
>> one, as bitmaps have gotten very extensive testing on 32- and 64-bit
>> gcc, but probably not much elsewhere).
>>
>> We do use structs to represent disk structures in other bits of the
>> packfile code (e.g., struct pack_idx_header), but the struct is vanilla
>> enough that we assume every compiler gives us two tightly-packed 32-bit
>> integers without having to bother with the "packed" attribute (and it
>> seems to have worked in practice).
>>
>> We should probably be more careful with that bitmap code. It looks like
>> it wouldn't be too bad to drop it. I'll see if I can come up with a
>> patch.
> 
> I confirmed that this does break horribly without the packed attribute
> (as you'd expect; it's asking for 48-bit alignment!). p5310 notices it,
> _if_ you have jgit installed to check against.
> 
> Here's a fix.
> 
> Subject: pack-bitmap: do not use gcc packed attribute
> 
> The "__attribute__" flag may be a noop on some compilers.
> That's OK as long as the code is correct without the
> attribute, but in this case it is not. We would typically
> end up with a struct that is 2 bytes too long due to struct
> padding, breaking both reading and writing of bitmaps.
> 
> We can work around this by using an array of unsigned char
> to represent the data, and relying on get/put_be32 to handle
> alignment issues as we interact with the array.
> 
> Signed-off-by: Jeff King 
> ---
> The accessors may be overkill; each function is called only a single
> time in the whole codebase. But doing it this way rather than accessing
> entry[4] inline at least puts the magic constants all in one place.
> 
[...]

Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk
structure directly, without copying to a uchar[6] first.

When writing, sha1write already buffers the data, so calling this with 4/1/1
bytes of payload shouldn't affect performance.

Similarly for reading - we already have a function to read a bitmap and
advance the 'file' position, why not have similar functions to read uint8/32
in a stream-based fashion?

This raises the question why we read via mmap at all (without munmap()ing the
file when done...). We copy all data into internal data structures anyway. Is
an fopen/fread-based solution (with fread_u8/_u32 helpers) that much slower?


Here's what I came up with (just a sketch, commit message is lacky and the
helper functions deserve a better place / name):

8<-
Subject: [PATCH] pack-bitmap: do not use packed structs to read / write bitmap
 files

Signed-off-by: Karsten Blees 
---
 pack-bitmap-write.c | 18 +-
 pack-bitmap.c   | 21 ++---
 pack-bitmap.h   |  6 --
 3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 5f1791a..01995cb 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -465,6 +465,16 @@ static const unsigned char *sha1_access(size_t pos, void 
*table)
return index[pos]->sha1;
 }
 
+static inline void sha1write_u8(struct sha1file *f, uint8_t data)
+{
+   sha1write(f, &data, sizeof(data));
+}
+static inline void sha1write_u32(struct sha1file *f, uint32_t data)
+{
+   data = htonl(data);
+   sha1write(f, &data, sizeof(data));
+}
+
 static void write_selected_commits_v1(struct sha1file *f,
  struct pack_idx_entry **index,
  uint32_t index_nr)
@@ -473,7 +483,6 @@ static void write_selected_commits_v1(struct sha1file *f,
 
for (i = 0; i < writer.selected_nr; ++i) {
struct bitmapped_commit *stored = &writer.selected[i];
-   struct bitmap_disk_entry on_disk;
 
int commit_pos =
sha1_pos(stored->commit->object.sha1, index, index_nr, 
sha1_access);
@@ -481,11 +490,10 @@ static void write_selected_commits_v1(struct sha1file *f,
if (commit_pos < 0)
die("BUG: trying to write commit not in index");
 
-   on_disk.object_pos = htonl(commit_pos);
-   on_disk.xor_offset = stored->xor_offset;
-   on_disk.flags = stored->flags;
+   sha1write_u32(f, commit_pos);
+   sha1

Re: struct hashmap_entry packing

2014-07-29 Thread Karsten Blees
Am 28.07.2014 19:17, schrieb Jeff King:
> Hi Karsten,
> 
> The hashmap_entry documentation claims:
> 
>   `struct hashmap_entry`::
> 
>   An opaque structure representing an entry in the hash table,
>   which must be used as first member of user data structures.
>   Ideally it should be followed by an int-sized member to prevent
>   unused memory on 64-bit systems due to alignment.
> 
> I'm not sure if the statement about alignment is true. If I have a
> struct like:
> 
> struct magic {
>   struct hashmap_entry map;
>   int x;
> };
> 
> the statement above implies that I should be able to fit this into only
> 16 bytes on an LP64 system. But I can't convince gcc to do it. And I
> think that makes sense, if you consider code like:
> 
>memset(&magic.map, 0, sizeof(struct hashmap_entry));
> 
> The sizeof() has to be the same regardless of whether the hashmap_entry
> is standalone or in another struct, and therefore must be padded up to
> 16 bytes. If we stored "x" in that padding in the combined struct, it
> would be overwritten by our memset.
> 

The struct-packing patch was ultimately dropped because there was no way
to reliably make it work on all platforms. See [1] for discussion, [2] for
the final, 'most compatible' version.

> Am I missing anything? If this is the case, we should probably drop that
> bit from the documentation.

Hmmm. Now that we have "__attribute__((packed))" in pack-bitmap.h, perhaps
we should do the same for stuct hashmap_entry? (Which was the original
proposal anyway...). Only works for GCC, but that should cover most builds
/ platforms.

Btw.: Using struct-packing on 'struct bitmap_disk_entry' means that the
binary format of .bitmap files is incompatible between GCC and other
builds, correct?

> It's possible that we could get around it by
> embedding the hashmap_entry elements directly into the parent struct,

Already tried that, see [3].

[1] http://article.gmane.org/gmane.comp.version-control.git/239069
[2] http://article.gmane.org/gmane.comp.version-control.git/241865
[3] http://article.gmane.org/gmane.comp.version-control.git/239435

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "error: Tweaking file descriptors doesn't work with this MSVCRT.dll" on wine

2014-07-29 Thread Karsten Blees
Am 28.07.2014 12:39, schrieb Duy Nguyen:
> I know wine is kind of second citizen but is there a cheap trick to
> make it work on wine? Reverting fcd428f (Win32: fix broken pipe
> detection - 2012-03-01) could result in conflicts in compat that I'm
> not comfortable resolving. I don't have Windows at home. Wine is the
> only option for me (or if somebody has a modern.ie image for KVM, or a
> simple recipe to make one, that'd be great). "Fix wine" is not really
> an option.
> 

Have you tried using a native msvcrt.dll instead of the Wine stub?
Downloading msvcrt.dll 7.0.2600 from dll-files.com and extracting it to
~/.wine/drive_c/Windows/System32 fixed the error message for me.

(The newest versions don't work as they depend on a bunch of other
dlls that you'd need too.)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH] Windows tests: let $TRASH_DIRECTORY point to native Windows path

2014-07-29 Thread Karsten Blees
MSYS programs typically understand native Windows paths (e.g C:/git), but
native Windows programs (including MinGW) don't understand MSYS paths (e.g.
/c/git).

On Windows, set TRASH_DIRECTORY to the absolute native path so that it can
be used more easily in tests.

MSYS 'tar -f' interprets everything before ':' as hostname, not as drive
letter. Change respective tests to use stdin / stdout instead of '-f'. Also
use $TAR from GIT-BUILD-OPTIONS rather than hardcoded tar.

Signed-off-by: Karsten Blees 
---

Am 25.07.2014 14:30, schrieb Duy Nguyen:
> On Wed, Jul 23, 2014 at 9:17 PM, Karsten Blees  
> wrote:
>> With the version in pu, three tests fail. t7001 is fixed with a newer 'cp'.
>> The other two are unrelated (introduced by nd/multiple-work-trees topic).
>>
>> * t1501-worktree: failed 1
>>   As of 5bbcb072 "setup.c: support multi-checkout repo setup"
>>   Using $TRASH_DIRECTORY doesn't work on Windows.
>>
>> * t2026-prune-linked-checkouts: failed 1
>>   As of 404a45f1 "prune: strategies for linked checkouts"
>>   Dito.
> 
> I need your help here. Would saving $(pwd) to a variable and using it
> instead of $TRASH_DIRECTORY work? Some tests "cd" around and $(pwd)
> may not be the same as $TRASH_DIRECTORY.
> 

Yes, that would work.

(Actually, you'd only need to change 'echo "$TRASH_DIR..."' in two places (both
before cd'ing away). The other instances are parameters to non-msys programs and
are thus automatically mangled by msys.dll.)

However, I wonder why we don't set up TRASH_DIRECTORY to the native Windows 
path.
I believe we'd get much fewer 'special' cases that way. Ideally, you shouldn't
have to worry about the intricacies of MSYS path mangling when writing tests...

[CCing msysgit for opinions]


 t/t3513-revert-submodule.sh | 4 ++--
 t/t6041-bisect-submodule.sh | 4 ++--
 t/test-lib.sh   | 1 +
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/t/t3513-revert-submodule.sh b/t/t3513-revert-submodule.sh
index a1c4e02..4a44dd6 100755
--- a/t/t3513-revert-submodule.sh
+++ b/t/t3513-revert-submodule.sh
@@ -14,11 +14,11 @@ test_description='revert can handle submodules'
 git_revert () {
git status -su >expect &&
ls -1pR * >>expect &&
-   tar czf "$TRASH_DIRECTORY/tmp.tgz" * &&
+   "$TAR" cz * >"$TRASH_DIRECTORY/tmp.tgz" &&
git checkout "$1" &&
git revert HEAD &&
rm -rf * &&
-   tar xzf "$TRASH_DIRECTORY/tmp.tgz" &&
+   "$TAR" xz <"$TRASH_DIRECTORY/tmp.tgz" &&
git status -su >actual &&
ls -1pR * >>actual &&
test_cmp expect actual &&
diff --git a/t/t6041-bisect-submodule.sh b/t/t6041-bisect-submodule.sh
index c6b7aa6..0de614f 100755
--- a/t/t6041-bisect-submodule.sh
+++ b/t/t6041-bisect-submodule.sh
@@ -8,7 +8,7 @@ test_description='bisect can handle submodules'
 git_bisect () {
git status -su >expect &&
ls -1pR * >>expect &&
-   tar czf "$TRASH_DIRECTORY/tmp.tgz" * &&
+   "$TAR" cz * > "$TRASH_DIRECTORY/tmp.tgz" &&
GOOD=$(git rev-parse --verify HEAD) &&
git checkout "$1" &&
echo "foo" >bar &&
@@ -20,7 +20,7 @@ git_bisect () {
git bisect start &&
git bisect good $GOOD &&
rm -rf * &&
-   tar xzf "$TRASH_DIRECTORY/tmp.tgz" &&
+   "$TAR" xz <"$TRASH_DIRECTORY/tmp.tgz" &&
git status -su >actual &&
ls -1pR * >>actual &&
test_cmp expect actual &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 5102340..5f6397b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -868,6 +868,7 @@ case $(uname -s) in
md5sum "$@"
}
# git sees Windows-style pwd
+   TRASH_DIRECTORY=$(pwd -W)
pwd () {
builtin pwd -W
}
-- 
2.0.2.897.g7f80809.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jul 2014, #04; Tue, 22)

2014-07-23 Thread Karsten Blees
Am 22.07.2014 23:44, schrieb Junio C Hamano:
> 
> * sk/mingw-uni-fix-more (2014-07-21) 14 commits
>  - Win32: enable color output in Windows cmd.exe
>  - Win32: patch Windows environment on startup
>  - Win32: keep the environment sorted
>  - Win32: use low-level memory allocation during initialization
>  - Win32: reduce environment array reallocations
>  - Win32: don't copy the environment twice when spawning child processes
>  - Win32: factor out environment block creation
>  - Win32: unify environment function names
>  - Win32: unify environment case-sensitivity
>  - Win32: fix environment memory leaks
>  - Win32: Unicode environment (incoming)
>  - Win32: Unicode environment (outgoing)
>  - Revert "Windows: teach getenv to do a case-sensitive search"
>  - tests: do not pass iso8859-1 encoded parameter
> 
>  Most of these are battle-tested in msysgit and are needed to
>  complete what has been merged to 'master' already.
> 
>  A fix has been squashed into "Unicode environ (outgoing)"; is this
>  now ready to go?
> 
> 
> * sk/mingw-tests-workaround (2014-07-21) 6 commits
>  - t800[12]: work around MSys limitation
>  - t9902: mingw-specific fix for gitfile link files
>  - t4210: skip command-line encoding tests on mingw
>  - MinGW: disable legacy encoding tests
>  - t0110/MinGW: skip tests that pass arbitrary bytes on the command line
>  - MinGW: Skip test redirecting to fd 4
>  (this branch is used by jc/not-mingw-cygwin.)
> 
>  Make tests pass on msysgit by mostly disabling ones that are
>  infeasible on that platform.
> 
>  The t0110 one has been replaced; is this now ready to go?
> 

Yes, I think both series are ready.

Compiles with msysgit and MSVC (with NO_CURL=1).

With the version in pu, three tests fail. t7001 is fixed with a newer 'cp'.
The other two are unrelated (introduced by nd/multiple-work-trees topic).

* t1501-worktree: failed 1
  As of 5bbcb072 "setup.c: support multi-checkout repo setup"
  Using $TRASH_DIRECTORY doesn't work on Windows.
  
* t2026-prune-linked-checkouts: failed 1
  As of 404a45f1 "prune: strategies for linked checkouts"
  Dito.

* t7001-mv: failed 6
  'cp -P' doesn't work due to outdated cp.exe.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in get_pwd_cwd() in Windows?

2014-07-23 Thread Karsten Blees
Am 23.07.2014 13:53, schrieb Duy Nguyen:
> On Wed, Jul 23, 2014 at 2:35 AM, René Scharfe  wrote:
>> Am 21.07.2014 16:13, schrieb Duy Nguyen:
>>
>>> This function tests if $PWD is the same as getcwd() using st_dev and
>>> st_ino. But on Windows these fields are always zero
>>> (mingw.c:do_lstat). If cwd is moved away, I think falling back to $PWD
>>> is wrong. I don't understand the use of $PWD in the first place.
>>> 1b9a946 (Use nonrelative paths instead of absolute paths for cloned
>>> repositories - 2008-06-05) does not explain much.
>>
>>
>> The commit message reads:
>>
>>   Particularly for the "alternates" file, if one will be created, we
>>   want a path that doesn't depend on the current directory, but we want
>>   to retain any symlinks in the path as given and any in the user's view
>>   of the current directory when the path was given.
>>
>> The intent of the patch seems to be to prefer $PWD if it points to the same
>> directory as the one returned by getcwd() in order to preserve "the user's
>> view".  That's why it introduces make_nonrelative_path() (now called
>> absolute_path()), in contrast to make_absolute_path() (now called
>> real_path()).
>>
>> I imagine it's useful e.g. if your home is accessed through a symlink:
>>
>> /home/foo -> /some/boring/mountpoint
>>
>> Then real_path("bar") would give you "/some/boring/mountpoint/bar", while
>> absolute_path("bar") returned "/home/foo/bar".  Not resolving symlinks keeps
>> the path friendly in this case.  And it keeps working even after the user's
>> home is migrated to /a/bigger/partition and /home/foo is updated
>> accordingly.
> 
> If it's saved back, then yes it's useful. And I think that's the case
> in clone.c. I was tempted to remove this code (because it only works
> if you stand at worktree's root dir anyway, else cwd is moved) but I
> guess we can just disable this code on Windows only instead.
> 

It is disabled on Windows as of 7d092adc get_pwd_cwd(): Do not trust 
st_dev/st_ino blindly



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] abspath.c: use PATH_MAX in real_path_internal()

2014-07-19 Thread Karsten Blees
Am 18.07.2014 12:49, schrieb Duy Nguyen:
> can be used, else we fall back to chdir. I think there are only four
> places that follow this pattern, here, setup.c (.git discovery), git.c
> (restore_env) and unix-socket.c. Enough call sites to make it worth
> the effort?
> 

real_path(): here we actually don't want to cd / cd back, we just do
this because getcwd reolves symlinks.

setup.c: AFAICT this sets up the work dir and stays there (no cd back).

git.c: Only does a 'preliminary' repo setup so that the alias mechanism
reads the correct config files. Then it reverts the setup if the
resulting git command doesn't need it. Perhaps it would be better (and
faster) to teach the alias code to read the right config file in the
first place?

unix-socket.c: This looks pretty broken. The cd / cd back logic is only
ever used if the socket path is too long. In this case, after cd'ing to
the parent directory of the socket, unix_stream_listen tries to unlink
the *original* socket path, instead of the remaining socket basename in
sockaddr_un.sun_path. I.e. the subsequent bind() will fail on second
invocation of the credential daemon.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] abspath.c: use PATH_MAX in real_path_internal()

2014-07-19 Thread Karsten Blees
Am 18.07.2014 13:32, schrieb René Scharfe:
> Am 18.07.2014 01:03, schrieb Karsten Blees:
>> Am 17.07.2014 19:05, schrieb René Scharfe:
>>> Am 17.07.2014 14:45, schrieb Nguyễn Thái Ngọc Duy:
>> [...]
>>> "These routines have traditionally been used by programs to save the
>>> name of a working directory for the purpose of returning to it. A much
>>> faster and less error-prone method of accomplishing this is to open the
>>> current directory (.) and use the fchdir(2) function to return."
>>>
>>
>> fchdir() is part of the POSIX-XSI extension, as is realpath(). So why not
>> use realpath() directly (which would also be thread-safe)?
> 
> That's a good question; thanks for stepping back and looking at the bigger 
> picture.  If there is widespread OS support for a functionality then we 
> should use it and just provide a compatibility implementation for those 
> platforms lacking it.  The downside is that compat code gets less testing.
> 

I just noticed that in contrast to the POSIX realpath(), our real_path() 
doesn't require the last path component to exist. I don't know if this property 
is required by the calling code, though.

> Seeing that readlink()

You mean realpath()? We don't have a stub for that yet.

> is left as a stub in compat/mingw.h that only errors out, would the 
> equivalent function on Windows be PathCanonicalize 
> (http://msdn.microsoft.com/en-us/library/windows/desktop/bb773569%28v=vs.85%29.aspx)?
> 

PathCanonicalize() doesn't return an absolute path, the realpath() equivalent 
would be GetFullPathName() (doesn't resolve symlinks) or 
GetFinalPathNameByHandle() (requires Vista, resolves symlinks, requires the 
path to exist).

>> For non-XSI-compliant platforms, we could keep the current implementation.
> 
> OK, so realpath() for Linux and the BSDs, mingw_realpath() wrapping 
> PathCanonicalize() for Windows and the current code for the rest?
> 
>> Or re-implement a thread-safe version, e.g. applying resolve_symlink() from
>> lockfile.c to all path components.
> 
> Thread safety sounds good.  We'd also need something like 
> normalize_path_copy() but without the conversion of backslashes to slashes, 
> in order to get rid of "." and ".." path components and something like 
> absolute_path() that doesn't die on error, no?
> 

Windows can handle forward slashes, so normalize_path_copy works just fine.

>> If I may bother you with the Windows point of view:
>>
>> There is no fchdir(), and I'm pretty sure open(".") won't work either.
> 
> On Windows, there *is* an absolute path length limit of 260 in the normal 
> case and a bit more than 32000 for some functions using the \\?\ namespace. 
> So one could get away with using a constant-sized buffer for a "remember the 
> place and return later" function here.
> 

The current directory is pretty much the only exception to the \\?\ trick [1]. 
So a fixed buffer for getcwd() would actually be fine on Windows (although it 
would have to be 3 * PATH_MAX, as PATH_MAX wide chars will convert to at most 3 
* PATH_MAX UTF-8 chars).

However, a POSIX conformant getcwd must fail with ERANGE if the buffer is too 
small. So a better alternative would be to add a strbuf_getcwd() that works 
similar to strbuf_readlink() (i.e. resize the buffer until its large enough).

Side note: the 'hard' 260 limit for the current directory also means that as 
long as we *simulate* realpath() via chdir()/getcwd(), long paths [1] don't 
work here.

> Also, _getcwd can be asked to allocate an appropriately-sized buffer for use, 
> like GNU's get_current_dir_name, by specifying NULL as its first parameter 
> (http://msdn.microsoft.com/en-us/library/sf98bd4y.aspx).
> 

We use nedmalloc in the Windows builds, so unfortuately we cannot free memory 
allocated by MSVCRT.dll.


[1] 
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365530%28v=vs.85%29.aspx
[2] https://github.com/msysgit/git/commit/84393750

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] t0110/MinGW: skip tests that pass arbitrary bytes on the command line

2014-07-19 Thread Karsten Blees
On Windows, the command line is a Unicode string, it is not possible to
pass arbitrary bytes to a program. Disable tests that try to do so.

Signed-off-by: Karsten Blees 
---
 t/t0110-urlmatch-normalization.sh | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/t/t0110-urlmatch-normalization.sh 
b/t/t0110-urlmatch-normalization.sh
index 8d6096d..410d576 100755
--- a/t/t0110-urlmatch-normalization.sh
+++ b/t/t0110-urlmatch-normalization.sh
@@ -117,7 +117,7 @@ test_expect_success 'url general escapes' '
test "$(test-urlmatch-normalization -p "X://W?'\!'")" = "x://w/?'\!'"
 '
 
-test_expect_success 'url high-bit escapes' '
+test_expect_success !MINGW 'url high-bit escapes' '
test "$(test-urlmatch-normalization -p "$(cat "$tu-1")")" = 
"x://q/%01%02%03%04%05%06%07%08%0E%0F%10%11%12" &&
test "$(test-urlmatch-normalization -p "$(cat "$tu-2")")" = 
"x://q/%13%14%15%16%17%18%19%1B%1C%1D%1E%1F%7F" &&
test "$(test-urlmatch-normalization -p "$(cat "$tu-3")")" = 
"x://q/%80%81%82%83%84%85%86%87%88%89%8A%8B%8C%8D%8E%8F" &&
@@ -127,7 +127,10 @@ test_expect_success 'url high-bit escapes' '
test "$(test-urlmatch-normalization -p "$(cat "$tu-7")")" = 
"x://q/%C0%C1%C2%C3%C4%C5%C6%C7%C8%C9%CA%CB%CC%CD%CE%CF" &&
test "$(test-urlmatch-normalization -p "$(cat "$tu-8")")" = 
"x://q/%D0%D1%D2%D3%D4%D5%D6%D7%D8%D9%DA%DB%DC%DD%DE%DF" &&
test "$(test-urlmatch-normalization -p "$(cat "$tu-9")")" = 
"x://q/%E0%E1%E2%E3%E4%E5%E6%E7%E8%E9%EA%EB%EC%ED%EE%EF" &&
-   test "$(test-urlmatch-normalization -p "$(cat "$tu-10")")" = 
"x://q/%F0%F1%F2%F3%F4%F5%F6%F7%F8%F9%FA%FB%FC%FD%FE%FF" &&
+   test "$(test-urlmatch-normalization -p "$(cat "$tu-10")")" = 
"x://q/%F0%F1%F2%F3%F4%F5%F6%F7%F8%F9%FA%FB%FC%FD%FE%FF"
+'
+
+test_expect_success 'url utf-8 escapes' '
test "$(test-urlmatch-normalization -p "$(cat "$tu-11")")" = 
"x://q/%C2%80%DF%BF%E0%A0%80%EF%BF%BD%F0%90%80%80%F0%AF%BF%BD"
 '
 
-- 
2.0.2.906.g50cb2fc.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fixup! Win32: Unicode environment (outgoing)

2014-07-19 Thread Karsten Blees
compat/mingw.c needs to #include "cache.h" for ALLOC_GROW.

Signed-off-by: Karsten Blees 
---
 compat/mingw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/compat/mingw.c b/compat/mingw.c
index bd45950..c725a3e 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -4,6 +4,7 @@
 #include 
 #include "../strbuf.h"
 #include "../run-command.h"
+#include "../cache.h"
 
 static const int delay[] = { 0, 1, 10, 20, 40 };
 
-- 
2.0.2.897.g7f80809.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] abspath.c: use PATH_MAX in real_path_internal()

2014-07-17 Thread Karsten Blees
Am 17.07.2014 19:05, schrieb René Scharfe:
> Am 17.07.2014 14:45, schrieb Nguyễn Thái Ngọc Duy:
[...]
> "These routines have traditionally been used by programs to save the
> name of a working directory for the purpose of returning to it. A much
> faster and less error-prone method of accomplishing this is to open the
> current directory (.) and use the fchdir(2) function to return."
> 

fchdir() is part of the POSIX-XSI extension, as is realpath(). So why not
use realpath() directly (which would also be thread-safe)?

For non-XSI-compliant platforms, we could keep the current implementation.
Or re-implement a thread-safe version, e.g. applying resolve_symlink() from
lockfile.c to all path components.


If I may bother you with the Windows point of view: 

There is no fchdir(), and I'm pretty sure open(".") won't work either.

fchdir() could be emulated using GetFileInformationByHandleEx(FileNameInfo).
realpath() is pretty much what GetFinalPathNameByHandle() does. However,
both of these APIs require Windows Vista.

Opening a handle to a directory can be done using FILE_FLAG_BACKUP_SEMANTICS,
which AFAICT MSVCRT.dll's open() implementation does _not_ do (could be
emulated in our mingw_open() wrapper, though).

...lots of work for little benefit, I would think.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] abspath.c: use PATH_MAX in real_path_internal()

2014-07-17 Thread Karsten Blees
Am 17.07.2014 20:03, schrieb Junio C Hamano:
> Nguyễn Thái Ngọc Duy   writes:
> 
>> This array 'cwd' is used to store the result from getcwd() and chdir()
>> back. PATH_MAX is the right constant for the job. On systems with
>> longer PATH_MAX (eg. 4096 on Linux), hard coding 1024 fails stuff,
>> e.g. "git init". Make it static too to reduce stack usage.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
> 
> Thanks.  It seems that this 1024 has been with us since the
> beginning of this piece of code.  I briefly wondered if there are
> strange platform that will have PATH_MAX shorter than 1024 that will
> be hurt by this change, but the result in cwd[] is used to grow the
> final result bufs[] that is sized based on PATH_MAX anyway, so it
> will not be an issue (besides, the absurdly short one seems to be
> a different macro, MAX_PATH, on Windows).
> 

Indeed, there's a strange platform where PATH_MAX is only 259. With
Unicode support, the current directory can be that many wide characters,
which means up to 3 * PATH_MAX UTF-8 bytes (if all of them are CJK).

I don't think this will be a problem, though, as the return buffer is
PATH_MAX-bounded as well.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] abspath.c: use PATH_MAX in real_path_internal()

2014-07-17 Thread Karsten Blees
Am 17.07.2014 14:45, schrieb Nguyễn Thái Ngọc Duy:
> e.g. "git init". Make it static too to reduce stack usage.

But wouldn't this increase overall memory usage? Stack memory
will be reused by subsequent code, while static memory cannot
be reused (but still increases the process working set).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] MinGW: Skip test redirecting to fd 4

2014-07-17 Thread Karsten Blees
Am 17.07.2014 20:41, schrieb Junio C Hamano:
> Stepan Kasal  writes:
> 
>> From: Johannes Schindelin 
>>
>> ... because that does not work in MinGW.
>>
>> Signed-off-by: Johannes Schindelin 
>> Signed-off-by: Stepan Kasal 
>> ---
>>  t/t0081-line-buffer.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/t/t0081-line-buffer.sh b/t/t0081-line-buffer.sh
>> index bd83ed3..25dba00 100755
>> --- a/t/t0081-line-buffer.sh
>> +++ b/t/t0081-line-buffer.sh
>> @@ -29,7 +29,7 @@ test_expect_success '0-length read, send along greeting' '
>>  test_cmp expect actual
>>  '
>>  
>> -test_expect_success 'read from file descriptor' '
>> +test_expect_success NOT_MINGW 'read from file descriptor' '
>>  rm -f input &&
>>  echo hello >expect &&
>>  echo hello >input &&
> 
> Hmm, the point of this test seems to be to exercise buffer_fdinit(),
> instead of buffer_init(), and the file descriptor does not have to
> be "4" for the purpose of the test, no?
> 
> Is what is broken on MinGW redirecting arbitrary file descrptors?

Yes. 0, 1 and 2 work (vie GetStdHandle), but anything else is handled
by the C-runtime. And as MSYS.dll (bash) and MSVCRT.dll (git) do it in
different ways, it doesn't work.

> - echo copy 6 |
> - test-line-buffer "&4" 4actual &&
> + test-line-buffer "&0" actual &&

test-line-buffer already reads commands ("copy 6") from stdin, so stdin cannot
be reused for the data stream, unfortunately.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/13] mingw unicode environment

2014-07-17 Thread Karsten Blees
Am 17.07.2014 21:00, schrieb Stepan Kasal:
> Hi,
> 
>> Karsten Blees  writes:
>>> I believe we prefer moving code to the right place over forward
>>> declarations (IIRC I got bashed for the latter in one of the first rounds
>>> of this patch series). If only to justify 'git-blame -M' :-D
> 
> indeed, my position is the same, generally.
> 
> But it turned out that the current ordering is sane, mostly works as it is,
> and I needed _only one_ fwd decl to make things compile.  This is why I
> decided to have things arranged this way.
> 

Fine with me.

However, if it *did* compile for you, I wonder where ALLOC_GROW (as of #02/13)
and alloc_nr (as of #10/13) came from? Or did we recently remove '#include 
"cache.h"'
from upstream mingw.c?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] Disable t0110's high-bit test on Windows

2014-07-17 Thread Karsten Blees
Am 17.07.2014 17:37, schrieb Stepan Kasal:
> From: Johannes Schindelin 
> 
> The bash Git for Windows uses (i.e. the MSys bash) cannot pass
> command-line arguments with high bits set verbatim to non-MSys programs,
> but instead converts those characters with high bits set to their hex
> representation.
> 

The description is not entirely correct...the Unicode-enabled MSYS.dll
expects the command line to be UTF-8. Only *invalid* UTF-8 is converted
to hex code for convenience. So its not the high bits that cause trouble,
but specifying 0x80 without proper UTF-8 lead byte.

I believe the last line of the test may actually work:

test "$(test-urlmatch-normalization -p "$(cat "$tu-11")")" = 
"x://q/%C2%80%DF%BF%E0%A0%80%EF%BF%BD%F0%90%80%80%F0%AF%BF%BD"

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/13] mingw unicode environment

2014-07-17 Thread Karsten Blees
Am 17.07.2014 17:37, schrieb Stepan Kasal:
> Hello,
> 
> this is the remainder of Karsten's unicode branch, that is a time
> proven part of msysGit.  (If this code is accepted, only one patch
> would only remain: gitk and git-gui fixes.)
> 

Thank you so much!

I had to add '#include "../cache.h"' to compile, due to use of
ALLOC_GROW and alloc_nr in some of the patches. In the msysgit HEAD,
Erik's hideDotFile patch does that [1].

[1] https://github.com/msysgit/git/commit/d85d2b75

After that (and applying your mingw test fixes), only t7001 fails
(the 'cp -P' issue).

> When rebasing Karsten's work, I have eliminated two commits:
> https://github.com/msysgit/git/commit/f967550
> https://github.com/msysgit/git/commit/290bf81
> 
> These commits only moved code down and up; this was not necessary, one
> forward declaration was all I needed.
> 

I believe we prefer moving code to the right place over forward
declarations (IIRC I got bashed for the latter in one of the first rounds
of this patch series). If only to justify 'git-blame -M' :-D

> One of the patches differs from the original version: "Enable color..."
> Following Karsten's suggestion, I have changed the value of env. var.
> TERM from "winterm" to "cygwin".  This is because the subprocesses see
> the variable and may try to find it in (their copy of) termcap.
> 

Good! One more step towards getting rid of the git-wrapper.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] config: use chmod() instead of fchmod()

2014-07-17 Thread Karsten Blees
Am 17.07.2014 00:16, schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> There is no fchmod() on native Windows platforms (MinGW and MSVC), and the
>> equivalent Win32 API (SetFileInformationByHandle) requires Windows Vista.
>>
>> Use chmod() instead.
>>
>> Signed-off-by: Karsten Blees 
>> ---
> 
> I am wondering if it is saner to just revert the fchmod() patch and
> replace it with something along the lines of
> 
> http://thread.gmane.org/gmane.comp.version-control.git/251682/focus=253219
> 

I also think it makes a lot of sense to handle permissions centrally.

However, with this patch, the permissions of the target file will
additionally be limited by umask (by passing them to open()), and then
overridden completely if core.sharedRepository is set.

Perhaps the lockfile API should respect the location of the lock files
(i.e. use core.sharedRepository in .git, 0666 in the work-tree, and
copy permissions anywhere else).

Another thing I find strange is that, by doing copy/replace, git silently
overwrites readonly files. If we grab the permissions from the source
file anyway, we should perhaps add 'if (!(perms & 0222)) error("file
is readonly");', or even 'access(filename, W_OK)'?

> Having said that, these are the only two callers of fchmod()
> currently in our code base, so I'll queue this patch to allow us to
> kick the problem-can down the road ;-)
> 

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] config: use chmod() instead of fchmod()

2014-07-16 Thread Karsten Blees
Am 16.07.2014 07:33, schrieb Johannes Sixt:
> Am 16.07.2014 00:54, schrieb Karsten Blees:
>> There is no fchmod() on native Windows platforms (MinGW and MSVC), and the
>> equivalent Win32 API (SetFileInformationByHandle) requires Windows Vista.
>>
>> Use chmod() instead.
>>
>> Signed-off-by: Karsten Blees 
>> ---
>>  config.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/config.c b/config.c
>> index ba882a1..9767c4b 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -1636,8 +1636,8 @@ int git_config_set_multivar_in_file(const char 
>> *config_filename,
>>  MAP_PRIVATE, in_fd, 0);
>>  close(in_fd);
>>  
>> -if (fchmod(fd, st.st_mode & 0) < 0) {
>> -error("fchmod on %s failed: %s",
>> +if (chmod(lock->filename, st.st_mode & 0) < 0) {
>> +error("chmod on %s failed: %s",
>>  lock->filename, strerror(errno));
>>  ret = CONFIG_NO_WRITE;
>>  goto out_free;
>> @@ -1815,8 +1815,8 @@ int git_config_rename_section_in_file(const char 
>> *config_filename,
>>  
>>  fstat(fileno(config_file), &st);
>>  
>> -if (fchmod(out_fd, st.st_mode & 0) < 0) {
>> -ret = error("fchmod on %s failed: %s",
>> +if (chmod(lock->filename, st.st_mode & 0) < 0) {
>> +ret = error("chmod on %s failed: %s",
>>  lock->filename, strerror(errno));
>>  goto out;
>>  }
>>
> 
> I assume you tested this patch on Windows. I am mildly surprised that
> (on Windows) chmod() works on a file that is still open.
> 
> -- Hannes
> 

Yes, file attributes can be set independently of open files. In fact, existing
code in git already does that in many places (via adjust_shared_perm(), which
is typically called while the file is open).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] MinGW: fix compile error due to missing ELOOP

2014-07-15 Thread Karsten Blees
Am 16.07.2014 01:42, schrieb Jonathan Nieder:
> Karsten Blees wrote:
> 
>> MinGW and MSVC before 2010 don't define ELOOP, use EMLINK (aka "Too many
>> links") instead.
> [...]
>> +#ifndef ELOOP
>> +#define ELOOP EMLINK
>> +#endif
> 
> This could use
> 
>   #define ELOOP WSAELOOP
> 
> as an alternative.  But it shouldn't matter since git doesn't look for
> EMLINK anywhere (EMLINK = 31, WSAELOOP = wsabaseerr+62 = 10062).
> 
> Reviewed-by: Jonathan Nieder 
> 

It matters when we report the error to the user (i.e. via die_errno):

strerror(EMLINK) -> "Too many links"
strerror(10062)  -> "Unknown error"

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] config: use chmod() instead of fchmod()

2014-07-15 Thread Karsten Blees
There is no fchmod() on native Windows platforms (MinGW and MSVC), and the
equivalent Win32 API (SetFileInformationByHandle) requires Windows Vista.

Use chmod() instead.

Signed-off-by: Karsten Blees 
---
 config.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/config.c b/config.c
index ba882a1..9767c4b 100644
--- a/config.c
+++ b/config.c
@@ -1636,8 +1636,8 @@ int git_config_set_multivar_in_file(const char 
*config_filename,
MAP_PRIVATE, in_fd, 0);
close(in_fd);
 
-   if (fchmod(fd, st.st_mode & 0) < 0) {
-   error("fchmod on %s failed: %s",
+   if (chmod(lock->filename, st.st_mode & 0) < 0) {
+   error("chmod on %s failed: %s",
lock->filename, strerror(errno));
ret = CONFIG_NO_WRITE;
goto out_free;
@@ -1815,8 +1815,8 @@ int git_config_rename_section_in_file(const char 
*config_filename,
 
fstat(fileno(config_file), &st);
 
-   if (fchmod(out_fd, st.st_mode & 0) < 0) {
-   ret = error("fchmod on %s failed: %s",
+   if (chmod(lock->filename, st.st_mode & 0) < 0) {
+   ret = error("chmod on %s failed: %s",
lock->filename, strerror(errno));
goto out;
}
-- 
2.0.1.779.g26aeac4.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] MinGW: fix compile error due to missing ELOOP

2014-07-15 Thread Karsten Blees
MinGW and MSVC before 2010 don't define ELOOP, use EMLINK (aka "Too many
links") instead.

Signed-off-by: Karsten Blees 
---
 compat/mingw.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/compat/mingw.h b/compat/mingw.h
index 405c08f..510530c 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -35,6 +35,9 @@ typedef int socklen_t;
 #ifndef EWOULDBLOCK
 #define EWOULDBLOCK EAGAIN
 #endif
+#ifndef ELOOP
+#define ELOOP EMLINK
+#endif
 #define SHUT_WR SD_SEND
 
 #define SIGHUP 1
-- 
2.0.1.779.g26aeac4.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fix test suite with mingw-unicode patches

2014-07-15 Thread Karsten Blees
Am 15.07.2014 20:20, schrieb Junio C Hamano:
> Stepan Kasal  writes:
> 
>> Hello Hannes,
>> attached please find the patches that Karsten pointed out:
>>
>> 1) The unicode file name support was omitted from his unicode patch
>> series; my mistake, sorry.  There is still big part missing: support
>> for unicode environment; I can only hope the tests would choke on
>> that.
>>
>> 2) Windows cannot pass non-UTF parameters (commit messages in this
>> case): original patch by Pat Thoyts was extended to apply to other
>> similar cases: the commit msg is passed through stdin.
>>
>> If there are still problems remaining, please tell us.
>>
>> Thanks,
>>  Stepan
>>
>> Karsten Blees (2):
>>   Win32: Unicode file name support (except dirent)
>>   Win32: Unicode file name support (dirent)
>>
>> Pat Thoyts and Stepan Kasal(1):
>>   tests: do not pass iso8859-1 encoded parameter
> 
> Thanks.  I'll queue these and wait for Windows folks to respond.
> With favourable feedback they can go directly from pu to master, I
> would think.
> 

Looking good. After fixing the ELOOP and fchmod issues (see followup
patches), there are 9 test failures left. Only one of these is
environment related, and for the rest we have fixes in the msysgit
fork:


* t0081-line-buffer: 1

Using file descriptor other than 0, 1, 2.
https://github.com/msysgit/git/commit/4940c51a


* t0110-urlmatch-normalization: 1

Passing binary data on the command line...would have to teach 
test-urlmatch-normalization.c to read from stdin or file.
https://github.com/msysgit/git/commit/be0d6dee


* t4036-format-patch-signer-mime: 1

not ok 4 - format with non ASCII signer name
#
#   GIT_COMMITTER_NAME="はまの ふにおう" \
#   git format-patch -s --stdout -1 >output &&
#   grep Content-Type output
#

Passing non-ASCII by environment variable, will be fixed by Unicode environment 
support.


* t4201-shortlog: 3

Passing binary data on the command line ('git-commit -m').
https://github.com/msysgit/git/commit/3717ce1b


* t4210-log-i18n: 2

Passing binary data on the command line ('git log --grep=$latin1_e').
https://github.com/msysgit/git/commit/dd2defa3


* t7001-mv: 6

cp -P fails in MinGW - perhaps use the long option forms (--no-dereference)?
https://github.com/msysgit/git/commit/00764ca1


* t8001-annotate/t8002-blame: 5

Msys.dll thinks '-L/regex/' is an absolute path and expands to 
'-LC:/msysgit/regex/'.
https://github.com/msysgit/git/commit/2d52168a


* t8005-blame-i18n: 4

Passing binary data on the command line ('git-commit --author -m').
https://github.com/msysgit/git/commit/3717ce1b


* t9902-completion: 2

Must use 'pwd -W' to get Windows-style absolute paths.
https://github.com/msysgit/git/commit/9b612448

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/3] dir.c: coding style fix

2014-07-15 Thread Karsten Blees
Am 15.07.2014 00:30, schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
>>  
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> Signed-off-by: Karsten Blees 
>> ---
> 
> Thanks for forwarding.   I'll fix-up the Yikes (see how these two
> lines show the same name in a very different way), but how did you
> produce the above?  Is there some fix we need in the toolchain that
> produces patch e-mails?
> 

Hmmm...I simply thought that this is how its supposed to work. Mail
headers can only contain US-ASCII, so the RFC 2047 Q-encoded-word
generated by git-format-patch looked good to me.

It seems git-mailinfo doesn't handle line breaks in header lines in
the mail body. I.e. if you remove the LF in the 'From:' line,
everything is fine, despite the Q-encoding.

Now, 'git-format-patch --from' seems to work around the problem, but
only for the 'From:' line. AFAICT there's no such option for
'Subject:', e.g. if you want to paste a patch after a scissors line,
you're on your own (see the example in the git-format-patch(1)
discussion section, with 'Subject:' both Q-encoded and line-wrapped).

Perhaps it should be clarified that git-format-patch output is not
suitable for pasting into mail clients? Or it should print headers
in plain text and let git-send-email handle the conversions?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No fchmod() under msygit - Was: Re: [PATCH 00/14] Add submodule test harness

2014-07-14 Thread Karsten Blees
Am 09.07.2014 22:00, schrieb Eric Wong:
> Torsten Bögershausen  wrote:
>> (And why is it  "& 0" and not  "& 0777")
> 
> This is to preserve the uncommon sticky/sgid/suid bits.  Probably not
> needed, but better to keep as much intact as possible.
> 
>> Can we avoid the fchmod()  all together ?
> 
> For single-user systems, sure.
> 
> For multi-user systems with git-imap-send users and passwords in
> $GIT_CONFIG, I suggest not.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

The Windows fchmod() problem is easily solved by using chmod() instead of
fchmod(). The file name is known in both cases (and even used in the error
message).

However, IMO there are more fundamental problems with your patch (not
related to Windows):

1.) Permissions of files in .git are controlled by the core.sharedRepository
setting, and your patch seems to break that (i.e. if someone accidentally
has made .git/config world readable, git-config no longer fixes that, even
if core.sharedRepository=0600).

2.) Sensitive data such as passwords doesn't belong in config files in the
first place, that's what git-credentials is good for.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 3/3] prep_exclude: remove the artificial PATH_MAX limit

2014-07-14 Thread Karsten Blees
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
 

This fixes a segfault in git-status with long paths on Windows,
where PATH_MAX is only 260.

This also fixes the problem of silently ignoring .gitignore if the
full path exceeds PATH_MAX. Now add_excludes_from_file() will report
if it gets ENAMETOOLONG.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Karsten Blees 
---
 dir.c | 47 ---
 dir.h |  2 +-
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/dir.c b/dir.c
index 3068ca8..fcb6872 100644
--- a/dir.c
+++ b/dir.c
@@ -799,12 +799,12 @@ static void prep_exclude(struct dir_struct *dir, const 
char *base, int baselen)
 */
while ((stk = dir->exclude_stack) != NULL) {
if (stk->baselen <= baselen &&
-   !strncmp(dir->basebuf, base, stk->baselen))
+   !strncmp(dir->basebuf.buf, base, stk->baselen))
break;
el = &group->el[dir->exclude_stack->exclude_ix];
dir->exclude_stack = stk->prev;
dir->exclude = NULL;
-   free((char *)el->src); /* see strdup() below */
+   free((char *)el->src); /* see strbuf_detach() below */
clear_exclude_list(el);
free(stk);
group->nr--;
@@ -814,8 +814,17 @@ static void prep_exclude(struct dir_struct *dir, const 
char *base, int baselen)
if (dir->exclude)
return;
 
+   /*
+* Lazy initialization. All call sites currently just
+* memset(dir, 0, sizeof(*dir)) before use. Changing all of
+* them seems lots of work for little benefit.
+*/
+   if (!dir->basebuf.buf)
+   strbuf_init(&dir->basebuf, PATH_MAX);
+
/* Read from the parent directories and push them down. */
current = stk ? stk->baselen : -1;
+   strbuf_setlen(&dir->basebuf, current < 0 ? 0 : current);
while (current < baselen) {
struct exclude_stack *stk = xcalloc(1, sizeof(*stk));
const char *cp;
@@ -833,48 +842,47 @@ static void prep_exclude(struct dir_struct *dir, const 
char *base, int baselen)
stk->baselen = cp - base;
stk->exclude_ix = group->nr;
el = add_exclude_list(dir, EXC_DIRS, NULL);
-   memcpy(dir->basebuf + current, base + current,
-  stk->baselen - current);
+   strbuf_add(&dir->basebuf, base + current, stk->baselen - 
current);
+   assert(stk->baselen == dir->basebuf.len);
 
/* Abort if the directory is excluded */
if (stk->baselen) {
int dt = DT_DIR;
-   dir->basebuf[stk->baselen - 1] = 0;
+   dir->basebuf.buf[stk->baselen - 1] = 0;
dir->exclude = last_exclude_matching_from_lists(dir,
-   dir->basebuf, stk->baselen - 1,
-   dir->basebuf + current, &dt);
-   dir->basebuf[stk->baselen - 1] = '/';
+   dir->basebuf.buf, stk->baselen - 1,
+   dir->basebuf.buf + current, &dt);
+   dir->basebuf.buf[stk->baselen - 1] = '/';
if (dir->exclude &&
dir->exclude->flags & EXC_FLAG_NEGATIVE)
dir->exclude = NULL;
if (dir->exclude) {
-   dir->basebuf[stk->baselen] = 0;
dir->exclude_stack = stk;
return;
}
}
 
-   /* Try to read per-directory file unless path is too long */
-   if (dir->exclude_per_dir &&
-   stk->baselen + strlen(dir->exclude_per_dir) < PATH_MAX) {
-   strcpy(dir->basebuf + stk->baselen,
-   dir->exclude_per_dir);
+   /* Try to read per-directory file */
+   if (dir->exclude_per_dir) {
/*
 * dir->basebuf gets reused by the traversal, but we
 * need fname to remain unchanged to ensure the src
 * member of each struct exclude correctly
 * back-references its source file.  Other invocations
 * of add_exclude_list provide stable strings, so we
-* strdup() and free() here in the caller.
+* strbuf_detach() and free() here in the 

[PATCH v1 2/3] dir.h: move struct exclude declaration to top level

2014-07-14 Thread Karsten Blees
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
 

There is no actual nested struct here. Move it out for clarity.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Karsten Blees 
---
 dir.h | 42 ++
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/dir.h b/dir.h
index 55e5345..02e3710 100644
--- a/dir.h
+++ b/dir.h
@@ -15,6 +15,27 @@ struct dir_entry {
 #define EXC_FLAG_MUSTBEDIR 8
 #define EXC_FLAG_NEGATIVE 16
 
+struct exclude {
+   /*
+* This allows callers of last_exclude_matching() etc.
+* to determine the origin of the matching pattern.
+*/
+   struct exclude_list *el;
+
+   const char *pattern;
+   int patternlen;
+   int nowildcardlen;
+   const char *base;
+   int baselen;
+   int flags;
+
+   /*
+* Counting starts from 1 for line numbers in ignore files,
+* and from -1 decrementing for patterns from CLI args.
+*/
+   int srcpos;
+};
+
 /*
  * Each excludes file will be parsed into a fresh exclude_list which
  * is appended to the relevant exclude_list_group (either EXC_DIRS or
@@ -32,26 +53,7 @@ struct exclude_list {
/* origin of list, e.g. path to filename, or descriptive string */
const char *src;
 
-   struct exclude {
-   /*
-* This allows callers of last_exclude_matching() etc.
-* to determine the origin of the matching pattern.
-*/
-   struct exclude_list *el;
-
-   const char *pattern;
-   int patternlen;
-   int nowildcardlen;
-   const char *base;
-   int baselen;
-   int flags;
-
-   /*
-* Counting starts from 1 for line numbers in ignore files,
-* and from -1 decrementing for patterns from CLI args.
-*/
-   int srcpos;
-   } **excludes;
+   struct exclude **excludes;
 };
 
 /*
-- 
2.0.0.9646.g840d1f9.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 1/3] dir.c: coding style fix

2014-07-14 Thread Karsten Blees
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?=
 

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Karsten Blees 
---
 dir.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index e65888d..3068ca8 100644
--- a/dir.c
+++ b/dir.c
@@ -557,8 +557,7 @@ int add_excludes_from_file_to_list(const char *fname,
buf = xrealloc(buf, size+1);
buf[size++] = '\n';
}
-   }
-   else {
+   } else {
size = xsize_t(st.st_size);
if (size == 0) {
close(fd);
@@ -793,9 +792,11 @@ static void prep_exclude(struct dir_struct *dir, const 
char *base, int baselen)
 
group = &dir->exclude_list_group[EXC_DIRS];
 
-   /* Pop the exclude lists from the EXCL_DIRS exclude_list_group
+   /*
+* Pop the exclude lists from the EXCL_DIRS exclude_list_group
 * which originate from directories not in the prefix of the
-* path being checked. */
+* path being checked.
+*/
while ((stk = dir->exclude_stack) != NULL) {
if (stk->baselen <= baselen &&
!strncmp(dir->basebuf, base, stk->baselen))
@@ -822,8 +823,7 @@ static void prep_exclude(struct dir_struct *dir, const char 
*base, int baselen)
if (current < 0) {
cp = base;
current = 0;
-   }
-   else {
+   } else {
cp = strchr(base + current + 1, '/');
if (!cp)
die("oops in prep_exclude");
-- 
2.0.0.9646.g840d1f9.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 0/3] dir.[ch]: remove PATH_MAX limitation

2014-07-14 Thread Karsten Blees
As discussed in [1], here's the first three patches of Duy's untracked 
cache series, which fixes a segfault with long paths on Windows.

[1] http://article.gmane.org/gmane.comp.version-control.msysgit/20702

Nguyễn Thái Ngọc Duy (3):
  dir.c: coding style fix
  dir.h: move struct exclude declaration to top level
  prep_exclude: remove the artificial PATH_MAX limit

 dir.c | 59 ++-
 dir.h | 44 +++-
 2 files changed, 57 insertions(+), 46 deletions(-)

-- 
2.0.0.9646.g840d1f9.dirty
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 17/17] api-trace.txt: add trace API documentation

2014-07-11 Thread Karsten Blees
Signed-off-by: Karsten Blees 
---
 Documentation/technical/api-trace.txt | 97 +++
 1 file changed, 97 insertions(+)
 create mode 100644 Documentation/technical/api-trace.txt

diff --git a/Documentation/technical/api-trace.txt 
b/Documentation/technical/api-trace.txt
new file mode 100644
index 000..097a651
--- /dev/null
+++ b/Documentation/technical/api-trace.txt
@@ -0,0 +1,97 @@
+trace API
+=
+
+The trace API can be used to print debug messages to stderr or a file. Trace
+code is inactive unless explicitly enabled by setting `GIT_TRACE*` environment
+variables.
+
+The trace implementation automatically adds `timestamp file:line ... \n` to
+all trace messages. E.g.:
+
+
+23:59:59.123456 git.c:312   trace: built-in: git 'foo'
+00:00:00.01 builtin/foo.c:99foo: some message
+
+
+Data Structures
+---
+
+`struct trace_key`::
+
+   Defines a trace key (or category). The default (for API functions that
+   don't take a key) is `GIT_TRACE`.
++
+E.g. to define a trace key controlled by environment variable `GIT_TRACE_FOO`:
++
+
+static struct trace_key trace_foo = TRACE_KEY_INIT(FOO);
+
+static void trace_print_foo(const char *message)
+{
+   trace_print_key(&trace_foo, message);
+}
+
++
+Note: don't use `const` as the trace implementation stores internal state in
+the `trace_key` structure.
+
+Functions
+-
+
+`int trace_want(struct trace_key *key)`::
+
+   Checks whether the trace key is enabled. Used to prevent expensive
+   string formatting before calling one of the printing APIs.
+
+`void trace_disable(struct trace_key *key)`::
+
+   Disables tracing for the specified key, even if the environment
+   variable was set.
+
+`void trace_printf(const char *format, ...)`::
+`void trace_printf_key(struct trace_key *key, const char *format, ...)`::
+
+   Prints a formatted message, similar to printf.
+
+`void trace_argv_printf(const char **argv, const char *format, ...)``::
+
+   Prints a formatted message, followed by a quoted list of arguments.
+
+`void trace_strbuf(struct trace_key *key, const struct strbuf *data)`::
+
+   Prints the strbuf, without additional formatting (i.e. doesn't
+   choke on `%` or even `\0`).
+
+`uint64_t getnanotime(void)`::
+
+   Returns nanoseconds since the epoch (01/01/1970), typically used
+   for performance measurements.
++
+Currently there are high precision timer implementations for Linux (using
+`clock_gettime(CLOCK_MONOTONIC)`) and Windows (`QueryPerformanceCounter`).
+Other platforms use `gettimeofday` as time source.
+
+`void trace_performance(uint64_t nanos, const char *format, ...)`::
+`void trace_performance_since(uint64_t start, const char *format, ...)`::
+
+   Prints the elapsed time (in nanoseconds), or elapsed time since
+   `start`, followed by a formatted message. Enabled via environment
+   variable `GIT_TRACE_PERFORMANCE`. Used for manual profiling, e.g.:
++
+
+uint64_t start = getnanotime();
+/* code section to measure */
+trace_performance_since(start, "foobar");
+
++
+
+uint64_t t = 0;
+for (;;) {
+   /* ignore */
+   t -= getnanotime();
+   /* code section to measure */
+   t += getnanotime();
+   /* ignore */
+}
+trace_performance(t, "frotz");
+
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 16/17] progress: simplify performance measurement by using getnanotime()

2014-07-11 Thread Karsten Blees
Calculating duration from a single uint64_t is simpler than from a struct
timeval. Change throughput measurement from gettimeofday() to
getnanotime().

Also calculate misec only if needed, and change integer division to integer
multiplication + shift, which should be slightly faster.

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 progress.c | 71 +++---
 1 file changed, 36 insertions(+), 35 deletions(-)

diff --git a/progress.c b/progress.c
index 261314e..412e6b1 100644
--- a/progress.c
+++ b/progress.c
@@ -12,13 +12,14 @@
 #include "gettext.h"
 #include "progress.h"
 #include "strbuf.h"
+#include "trace.h"
 
 #define TP_IDX_MAX  8
 
 struct throughput {
off_t curr_total;
off_t prev_total;
-   struct timeval prev_tv;
+   uint64_t prev_ns;
unsigned int avg_bytes;
unsigned int avg_misecs;
unsigned int last_bytes[TP_IDX_MAX];
@@ -127,65 +128,65 @@ static void throughput_string(struct strbuf *buf, off_t 
total,
 void display_throughput(struct progress *progress, off_t total)
 {
struct throughput *tp;
-   struct timeval tv;
-   unsigned int misecs;
+   uint64_t now_ns;
+   unsigned int misecs, count, rate;
+   struct strbuf buf = STRBUF_INIT;
 
if (!progress)
return;
tp = progress->throughput;
 
-   gettimeofday(&tv, NULL);
+   now_ns = getnanotime();
 
if (!tp) {
progress->throughput = tp = calloc(1, sizeof(*tp));
if (tp) {
tp->prev_total = tp->curr_total = total;
-   tp->prev_tv = tv;
+   tp->prev_ns = now_ns;
}
return;
}
tp->curr_total = total;
 
+   /* only update throughput every 0.5 s */
+   if (now_ns - tp->prev_ns <= 5)
+   return;
+
/*
-* We have x = bytes and y = microsecs.  We want z = KiB/s:
+* We have x = bytes and y = nanosecs.  We want z = KiB/s:
 *
-*  z = (x / 1024) / (y / 100)
-*  z = x / y * 100 / 1024
-*  z = x / (y * 1024 / 100)
+*  z = (x / 1024) / (y / 10)
+*  z = x / y * 10 / 1024
+*  z = x / (y * 1024 / 10)
 *  z = x / y'
 *
 * To simplify things we'll keep track of misecs, or 1024th of a sec
 * obtained with:
 *
-*  y' = y * 1024 / 100
-*  y' = y / (100 / 1024)
-*  y' = y / 977
+*  y' = y * 1024 / 10
+*  y' = y * (2^10 / 2^42) * (2^42 / 10)
+*  y' = y / 2^32 * 4398
+*  y' = (y * 4398) >> 32
 */
-   misecs = (tv.tv_sec - tp->prev_tv.tv_sec) * 1024;
-   misecs += (int)(tv.tv_usec - tp->prev_tv.tv_usec) / 977;
+   misecs = ((now_ns - tp->prev_ns) * 4398) >> 32;
 
-   if (misecs > 512) {
-   struct strbuf buf = STRBUF_INIT;
-   unsigned int count, rate;
+   count = total - tp->prev_total;
+   tp->prev_total = total;
+   tp->prev_ns = now_ns;
+   tp->avg_bytes += count;
+   tp->avg_misecs += misecs;
+   rate = tp->avg_bytes / tp->avg_misecs;
+   tp->avg_bytes -= tp->last_bytes[tp->idx];
+   tp->avg_misecs -= tp->last_misecs[tp->idx];
+   tp->last_bytes[tp->idx] = count;
+   tp->last_misecs[tp->idx] = misecs;
+   tp->idx = (tp->idx + 1) % TP_IDX_MAX;
 
-   count = total - tp->prev_total;
-   tp->prev_total = total;
-   tp->prev_tv = tv;
-   tp->avg_bytes += count;
-   tp->avg_misecs += misecs;
-   rate = tp->avg_bytes / tp->avg_misecs;
-   tp->avg_bytes -= tp->last_bytes[tp->idx];
-   tp->avg_misecs -= tp->last_misecs[tp->idx];
-   tp->last_bytes[tp->idx] = count;
-   tp->last_misecs[tp->idx] = misecs;
-   tp->idx = (tp->idx + 1) % TP_IDX_MAX;
-
-   throughput_string(&buf, total, rate);
-   strncpy(tp->display, buf.buf, sizeof(tp->display));
-   strbuf_release(&buf);
-   if (progress->last_value != -1 && progress_update)
-   display(progress, progress->last_value, NULL);
-   }
+   throughput_string(&buf, total, rate);
+   strncpy(tp->display, buf.buf, sizeof(tp->display));
+   strbuf_release(&buf);
+   if (progress->last_value != -1 && progress_update)
+   display(progress, progress->last_value, NULL);
 }
 
 int display_progress(struct progress *progress, unsigned n)
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 15/17] wt-status: simplify performance measurement by using getnanotime()

2014-07-11 Thread Karsten Blees
Calculating duration from a single uint64_t is simpler than from a struct
timeval. Change performance measurement for 'advice.statusuoption' from
gettimeofday() to getnanotime().

Also initialize t_begin to prevent uninitialized variable warning.

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 wt-status.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 318a191..dfdc018 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -574,14 +574,11 @@ static void wt_status_collect_untracked(struct wt_status 
*s)
 {
int i;
struct dir_struct dir;
-   struct timeval t_begin;
+   uint64_t t_begin = getnanotime();
 
if (!s->show_untracked_files)
return;
 
-   if (advice_status_u_option)
-   gettimeofday(&t_begin, NULL);
-
memset(&dir, 0, sizeof(dir));
if (s->show_untracked_files != SHOW_ALL_UNTRACKED_FILES)
dir.flags |=
@@ -612,13 +609,8 @@ static void wt_status_collect_untracked(struct wt_status 
*s)
free(dir.ignored);
clear_directory(&dir);
 
-   if (advice_status_u_option) {
-   struct timeval t_end;
-   gettimeofday(&t_end, NULL);
-   s->untracked_in_ms =
-   (uint64_t)t_end.tv_sec * 1000 + t_end.tv_usec / 1000 -
-   ((uint64_t)t_begin.tv_sec * 1000 + t_begin.tv_usec / 
1000);
-   }
+   if (advice_status_u_option)
+   s->untracked_in_ms = (getnanotime() - t_begin) / 100;
 }
 
 void wt_status_collect(struct wt_status *s)
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 14/17] git: add performance tracing for git's main() function to debug scripts

2014-07-11 Thread Karsten Blees
Use trace_performance to measure and print execution time and command line
arguments of the entire main() function. In constrast to the shell's 'time'
utility, which measures total time of the parent process, this logs all
involved git commands recursively. This is particularly useful to debug
performance issues of scripted commands (i.e. which git commands were
called with which parameters, and how long did they execute).

Due to git's deliberate use of exit(), the implementation uses an atexit
routine rather than just adding trace_performance_since() at the end of
main().

Usage example: > GIT_TRACE_PERFORMANCE=~/git-trace.log git stash list

Creates a log file like this:
23:57:38.638765 trace.c:405 performance: 0.000310107 s: git command: 'git' 
'rev-parse' '--git-dir'
23:57:38.644387 trace.c:405 performance: 0.000261759 s: git command: 'git' 
'rev-parse' '--show-toplevel'
23:57:38.646207 trace.c:405 performance: 0.000304468 s: git command: 'git' 
'config' '--get-colorbool' 'color.interactive'
23:57:38.648491 trace.c:405 performance: 0.000241667 s: git command: 'git' 
'config' '--get-color' 'color.interactive.help' 'red bold'
23:57:38.650465 trace.c:405 performance: 0.000243063 s: git command: 'git' 
'config' '--get-color' '' 'reset'
23:57:38.654850 trace.c:405 performance: 0.025126313 s: git command: 'git' 
'stash' 'list'

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 Documentation/git.txt |  5 +
 git.c |  2 ++
 trace.c   | 22 ++
 trace.h   |  1 +
 4 files changed, 30 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 9d073f6..fcb8afa 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -938,6 +938,11 @@ Unsetting the variable, or setting it to empty, "0" or
starting with "PACK".
See 'GIT_TRACE' for available trace output options.
 
+'GIT_TRACE_PERFORMANCE'::
+   Enables performance related trace messages, e.g. total execution
+   time of each Git command.
+   See 'GIT_TRACE' for available trace output options.
+
 'GIT_TRACE_SETUP'::
Enables trace messages printing the .git, working tree and current
working directory after Git has completed its setup phase.
diff --git a/git.c b/git.c
index 7780572..d4daeb8 100644
--- a/git.c
+++ b/git.c
@@ -568,6 +568,8 @@ int main(int argc, char **av)
 
git_setup_gettext();
 
+   trace_command_performance(argv);
+
/*
 * "git-" is the same as "git ", but we obviously:
 *
diff --git a/trace.c b/trace.c
index af64dbb..e583dc6 100644
--- a/trace.c
+++ b/trace.c
@@ -404,3 +404,25 @@ inline uint64_t getnanotime(void)
return now;
}
 }
+
+static uint64_t command_start_time;
+static struct strbuf command_line = STRBUF_INIT;
+
+static void print_command_performance_atexit(void)
+{
+   trace_performance_since(command_start_time, "git command:%s",
+   command_line.buf);
+}
+
+void trace_command_performance(const char **argv)
+{
+   if (!trace_want(&trace_perf_key))
+   return;
+
+   if (!command_start_time)
+   atexit(print_command_performance_atexit);
+
+   strbuf_reset(&command_line);
+   sq_quote_argv(&command_line, argv, 0);
+   command_start_time = getnanotime();
+}
diff --git a/trace.h b/trace.h
index c843e68..ae6a332 100644
--- a/trace.h
+++ b/trace.h
@@ -17,6 +17,7 @@ extern void trace_repo_setup(const char *prefix);
 extern int trace_want(struct trace_key *key);
 extern void trace_disable(struct trace_key *key);
 extern uint64_t getnanotime(void);
+extern void trace_command_performance(const char **argv);
 
 #ifndef HAVE_VARIADIC_MACROS
 
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 13/17] trace: add trace_performance facility to debug performance issues

2014-07-11 Thread Karsten Blees
Add trace_performance and trace_performance_since macros that print a
duration and an optional printf-formatted text to the file specified in
environment variable GIT_TRACE_PERFORMANCE.

These macros, in conjunction with getnanotime(), are intended to simplify
performance measurements from within the application (i.e. profiling via
manual instrumentation, rather than using an external profiling tool).

Unless enabled via GIT_TRACE_PERFORMANCE, these macros have no noticeable
impact on performance, so that test code for well known time killers may
be shipped in release builds. Alternatively, a developer could provide an
additional performance patch (not meant for master) that allows reviewers
to reproduce performance tests more easily, e.g. on other platforms or
using their own repositories.

Usage examples:

Simple use case (measure one code section):

  uint64_t start = getnanotime();
  /* code section to measure */
  trace_performance_since(start, "foobar");

Complex use case (measure repetitive code sections):

  uint64_t t = 0;
  for (;;) {
/* ignore */
t -= getnanotime();
/* code section to measure */
t += getnanotime();
/* ignore */
  }
  trace_performance(t, "frotz");

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 trace.c | 47 +++
 trace.h | 18 ++
 2 files changed, 65 insertions(+)

diff --git a/trace.c b/trace.c
index b9d7272..af64dbb 100644
--- a/trace.c
+++ b/trace.c
@@ -169,6 +169,27 @@ void trace_strbuf_fl(const char *file, int line, struct 
trace_key *key,
print_trace_line(key, &buf);
 }
 
+static struct trace_key trace_perf_key = TRACE_KEY_INIT(PERFORMANCE);
+
+static void trace_performance_vprintf_fl(const char *file, int line,
+uint64_t nanos, const char *format,
+va_list ap)
+{
+   struct strbuf buf = STRBUF_INIT;
+
+   if (!prepare_trace_line(file, line, &trace_perf_key, &buf))
+   return;
+
+   strbuf_addf(&buf, "performance: %.9f s", (double) nanos / 10);
+
+   if (format && *format) {
+   strbuf_addstr(&buf, ": ");
+   strbuf_vaddf(&buf, format, ap);
+   }
+
+   print_trace_line(&trace_perf_key, &buf);
+}
+
 #ifndef HAVE_VARIADIC_MACROS
 
 void trace_printf(const char *format, ...)
@@ -200,6 +221,23 @@ void trace_strbuf(const char *key, const struct strbuf 
*data)
trace_strbuf_fl(NULL, 0, key, data);
 }
 
+void trace_performance(uint64_t nanos, const char *format, ...)
+{
+   va_list ap;
+   va_start(ap, format);
+   trace_performance_vprintf_fl(NULL, 0, nanos, format, ap);
+   va_end(ap);
+}
+
+void trace_performance_since(uint64_t start, const char *format, ...)
+{
+   va_list ap;
+   va_start(ap, format);
+   trace_performance_vprintf_fl(NULL, 0, getnanotime() - start,
+format, ap);
+   va_end(ap);
+}
+
 #else
 
 void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
@@ -220,6 +258,15 @@ void trace_argv_printf_fl(const char *file, int line, 
const char **argv,
va_end(ap);
 }
 
+void trace_performance_fl(const char *file, int line, uint64_t nanos,
+ const char *format, ...)
+{
+   va_list ap;
+   va_start(ap, format);
+   trace_performance_vprintf_fl(file, line, nanos, format, ap);
+   va_end(ap);
+}
+
 #endif /* HAVE_VARIADIC_MACROS */
 
 
diff --git a/trace.h b/trace.h
index 4b893a5..c843e68 100644
--- a/trace.h
+++ b/trace.h
@@ -31,6 +31,14 @@ extern void trace_argv_printf(const char **argv, const char 
*format, ...);
 
 extern void trace_strbuf(struct trace_key *key, const struct strbuf *data);
 
+/* Prints elapsed time (in nanoseconds) if GIT_TRACE_PERFORMANCE is enabled. */
+__attribute__((format (printf, 2, 3)))
+extern void trace_performance(uint64_t nanos, const char *format, ...);
+
+/* Prints elapsed time since 'start' if GIT_TRACE_PERFORMANCE is enabled. */
+__attribute__((format (printf, 2, 3)))
+extern void trace_performance_since(uint64_t start, const char *format, ...);
+
 #else
 
 /*
@@ -79,6 +87,13 @@ extern void trace_strbuf(struct trace_key *key, const struct 
strbuf *data);
 #define trace_strbuf(key, data) \
trace_strbuf_fl(TRACE_CONTEXT, __LINE__, key, data)
 
+#define trace_performance(nanos, ...) \
+   trace_performance_fl(TRACE_CONTEXT, __LINE__, nanos, __VA_ARGS__)
+
+#define trace_performance_since(start, ...) \
+   trace_performance_fl(TRACE_CONTEXT, __LINE__, getnanotime() - (start), \
+__VA_ARGS__)
+
 /* backend functions, use non-*fl macros instead */
 __attribute__((format (printf, 4, 5)))
 extern void trace_printf_key_fl(const char *file, int line, struct trace_key 
*key,
@@ -88,6 +103,9 @@ extern void trace_argv_printf

[PATCH v8 12/17] trace: add high resolution timer function to debug performance issues

2014-07-11 Thread Karsten Blees
Add a getnanotime() function that returns nanoseconds since 01/01/1970 as
unsigned 64-bit integer (i.e. overflows in july 2554). This is easier to
work with than e.g. struct timeval or struct timespec. Basing the timer on
the epoch allows using the results with other time-related APIs.

To simplify adaption to different platforms, split the implementation into
a common getnanotime() and a platform-specific highres_nanos() function.

The common getnanotime() function handles errors, falling back to
gettimeofday() if highres_nanos() isn't implemented or doesn't work.

getnanotime() is also responsible for normalizing to the epoch. The offset
to the system clock is calculated only once on initialization, i.e.
manually setting the system clock has no impact on the timer (except if
the fallback gettimeofday() is in use). Git processes are typically short
lived, so we don't need to handle clock drift.

The highres_nanos() function returns monotonically increasing nanoseconds
relative to some arbitrary point in time (e.g. system boot), or 0 on
failure. Providing platform-specific implementations should be relatively
easy, e.g. adapting to clock_gettime() as defined by the POSIX realtime
extensions is seven lines of code.

This version includes highres_nanos() implementations for:
 * Linux: using clock_gettime(CLOCK_MONOTONIC)
 * Windows: using QueryPerformanceCounter()

Todo:
 * enable clock_gettime() on more platforms
 * add Mac OSX version, e.g. using mach_absolute_time + mach_timebase_info

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 Makefile |  7 +
 config.mak.uname |  1 +
 trace.c  | 82 
 trace.h  |  1 +
 4 files changed, 91 insertions(+)

diff --git a/Makefile b/Makefile
index 07ea105..80f4390 100644
--- a/Makefile
+++ b/Makefile
@@ -340,6 +340,8 @@ all::
 #
 # Define GMTIME_UNRELIABLE_ERRORS if your gmtime() function does not
 # return NULL when it receives a bogus time_t.
+#
+# Define HAVE_CLOCK_GETTIME if your platform has clock_gettime in librt.
 
 GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -1497,6 +1499,11 @@ ifdef GMTIME_UNRELIABLE_ERRORS
BASIC_CFLAGS += -DGMTIME_UNRELIABLE_ERRORS
 endif
 
+ifdef HAVE_CLOCK_GETTIME
+   BASIC_CFLAGS += -DHAVE_CLOCK_GETTIME
+   EXTLIBS += -lrt
+endif
+
 ifeq ($(TCLTK_PATH),)
 NO_TCLTK = NoThanks
 endif
diff --git a/config.mak.uname b/config.mak.uname
index 1ae675b..dad2618 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -34,6 +34,7 @@ ifeq ($(uname_S),Linux)
HAVE_PATHS_H = YesPlease
LIBC_CONTAINS_LIBINTL = YesPlease
HAVE_DEV_TTY = YesPlease
+   HAVE_CLOCK_GETTIME = YesPlease
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
HAVE_ALLOCA_H = YesPlease
diff --git a/trace.c b/trace.c
index f013958..b9d7272 100644
--- a/trace.c
+++ b/trace.c
@@ -275,3 +275,85 @@ int trace_want(struct trace_key *key)
 {
return !!get_trace_fd(key);
 }
+
+#ifdef HAVE_CLOCK_GETTIME
+
+static inline uint64_t highres_nanos(void)
+{
+   struct timespec ts;
+   if (clock_gettime(CLOCK_MONOTONIC, &ts))
+   return 0;
+   return (uint64_t) ts.tv_sec * 10 + ts.tv_nsec;
+}
+
+#elif defined (GIT_WINDOWS_NATIVE)
+
+static inline uint64_t highres_nanos(void)
+{
+   static uint64_t high_ns, scaled_low_ns;
+   static int scale;
+   LARGE_INTEGER cnt;
+
+   if (!scale) {
+   if (!QueryPerformanceFrequency(&cnt))
+   return 0;
+
+   /* high_ns = number of ns per cnt.HighPart */
+   high_ns = (10LL << 32) / (uint64_t) cnt.QuadPart;
+
+   /*
+* Number of ns per cnt.LowPart is 10^9 / frequency (or
+* high_ns >> 32). For maximum precision, we scale this factor
+* so that it just fits within 32 bit (i.e. won't overflow if
+* multiplied with cnt.LowPart).
+*/
+   scaled_low_ns = high_ns;
+   scale = 32;
+   while (scaled_low_ns >= 0x1LL) {
+   scaled_low_ns >>= 1;
+   scale--;
+   }
+   }
+
+   /* if QPF worked on initialization, we expect QPC to work as well */
+   QueryPerformanceCounter(&cnt);
+
+   return (high_ns * cnt.HighPart) +
+  ((scaled_low_ns * cnt.LowPart) >> scale);
+}
+
+#else
+# define highres_nanos() 0
+#endif
+
+static inline uint64_t gettimeofday_nanos(void)
+{
+   struct timeval tv;
+   gettimeofday(&tv, NULL);
+   return (uint64_t) tv.tv_sec * 10 + tv.tv_usec * 1000;
+}
+
+/*
+ * Returns nanoseconds since the epoch (01/01/1970), for performance tracing
+ * (i.e. favoring high precision over wall clock time accuracy).
+ */
+inline uint64_t getnanotime(void)
+{
+   static uint64_t offset;
+   if (of

[PATCH v8 10/17] trace: move code around, in preparation to file:line output

2014-07-11 Thread Karsten Blees
No functional changes, just move stuff around so that the next patch isn't
that ugly...

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 trace.c | 36 ++--
 trace.h | 12 
 2 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/trace.c b/trace.c
index 18e5d93..e8ce619 100644
--- a/trace.c
+++ b/trace.c
@@ -132,20 +132,20 @@ static void trace_vprintf(struct trace_key *key, const 
char *format, va_list ap)
print_trace_line(key, &buf);
 }
 
-void trace_printf_key(struct trace_key *key, const char *format, ...)
+void trace_argv_printf(const char **argv, const char *format, ...)
 {
+   struct strbuf buf = STRBUF_INIT;
va_list ap;
-   va_start(ap, format);
-   trace_vprintf(key, format, ap);
-   va_end(ap);
-}
 
-void trace_printf(const char *format, ...)
-{
-   va_list ap;
+   if (!prepare_trace_line(NULL, &buf))
+   return;
+
va_start(ap, format);
-   trace_vprintf(NULL, format, ap);
+   strbuf_vaddf(&buf, format, ap);
va_end(ap);
+
+   sq_quote_argv(&buf, argv, 0);
+   print_trace_line(NULL, &buf);
 }
 
 void trace_strbuf(struct trace_key *key, const struct strbuf *data)
@@ -159,20 +159,20 @@ void trace_strbuf(struct trace_key *key, const struct 
strbuf *data)
print_trace_line(key, &buf);
 }
 
-void trace_argv_printf(const char **argv, const char *format, ...)
+void trace_printf(const char *format, ...)
 {
-   struct strbuf buf = STRBUF_INIT;
va_list ap;
-
-   if (!prepare_trace_line(NULL, &buf))
-   return;
-
va_start(ap, format);
-   strbuf_vaddf(&buf, format, ap);
+   trace_vprintf(NULL, format, ap);
va_end(ap);
+}
 
-   sq_quote_argv(&buf, argv, 0);
-   print_trace_line(NULL, &buf);
+void trace_printf_key(struct trace_key *key, const char *format, ...)
+{
+   va_list ap;
+   va_start(ap, format);
+   trace_vprintf(key, format, ap);
+   va_end(ap);
 }
 
 static const char *quote_crnl(const char *path)
diff --git a/trace.h b/trace.h
index 28c1089..b4800e7 100644
--- a/trace.h
+++ b/trace.h
@@ -13,15 +13,19 @@ struct trace_key {
 
 #define TRACE_KEY_INIT(name) { "GIT_TRACE_" #name, 0, 0, 0 }
 
-__attribute__((format (printf, 1, 2)))
-extern void trace_printf(const char *format, ...);
-__attribute__((format (printf, 2, 3)))
-extern void trace_argv_printf(const char **argv, const char *format, ...);
 extern void trace_repo_setup(const char *prefix);
 extern int trace_want(struct trace_key *key);
 extern void trace_disable(struct trace_key *key);
+
+__attribute__((format (printf, 1, 2)))
+extern void trace_printf(const char *format, ...);
+
 __attribute__((format (printf, 2, 3)))
 extern void trace_printf_key(struct trace_key *key, const char *format, ...);
+
+__attribute__((format (printf, 2, 3)))
+extern void trace_argv_printf(const char **argv, const char *format, ...);
+
 extern void trace_strbuf(struct trace_key *key, const struct strbuf *data);
 
 #endif /* TRACE_H */
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 11/17] trace: add 'file:line' to all trace output

2014-07-11 Thread Karsten Blees
This is useful to see where trace output came from.

Add 'const char *file, int line' parameters to the printing functions and
rename them to *_fl.

Add trace_printf* and trace_strbuf macros resolving to the *_fl functions
and let the preprocessor fill in __FILE__ and __LINE__.

As the trace_printf* functions take a variable number of arguments, this
requires variadic macros (i.e. '#define foo(...) foo_impl(__VA_ARGS__)'.
Though part of C99, it is unclear whether older compilers support this.
Thus keep the old functions and only enable variadic macros for GNUC and
MSVC 2005+ (_MSC_VER 1400). This has the nice side effect that the old
C-style declarations serve as documentation how the macros are to be used.

Print 'file:line ' as prefix to each trace line. Align the remaining trace
output at column 40 to accommodate 18 char file names + 4 digit line
number (currently there are 30 *.c files of length 18 and just 11 of 19).
Trace output from longer source files (e.g. builtin/receive-pack.c) will
not be aligned.

Signed-off-by: Karsten Blees 
---
 git-compat-util.h |  4 
 trace.c   | 72 +--
 trace.h   | 62 +++
 3 files changed, 126 insertions(+), 12 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index b6f03b3..4dd065d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -704,6 +704,10 @@ void git_qsort(void *base, size_t nmemb, size_t size,
 #endif
 #endif
 
+#if defined(__GNUC__) || (_MSC_VER >= 1400)
+#define HAVE_VARIADIC_MACROS 1
+#endif
+
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.
  * Always returns the return value of unlink(2).
diff --git a/trace.c b/trace.c
index e8ce619..f013958 100644
--- a/trace.c
+++ b/trace.c
@@ -85,7 +85,8 @@ void trace_disable(struct trace_key *key)
 static const char err_msg[] = "Could not trace into fd given by "
"GIT_TRACE environment variable";
 
-static int prepare_trace_line(struct trace_key *key, struct strbuf *buf)
+static int prepare_trace_line(const char *file, int line,
+ struct trace_key *key, struct strbuf *buf)
 {
static struct trace_key trace_bare = TRACE_KEY_INIT(BARE);
struct timeval tv;
@@ -108,6 +109,14 @@ static int prepare_trace_line(struct trace_key *key, 
struct strbuf *buf)
strbuf_addf(buf, "%02d:%02d:%02d.%06ld ", tm.tm_hour, tm.tm_min,
tm.tm_sec, (long) tv.tv_usec);
 
+#ifdef HAVE_VARIADIC_MACROS
+   /* print file:line */
+   strbuf_addf(buf, "%s:%d ", file, line);
+   /* align trace output (column 40 catches most files names in git) */
+   while (buf->len < 40)
+   strbuf_addch(buf, ' ');
+#endif
+
return 1;
 }
 
@@ -121,49 +130,52 @@ static void print_trace_line(struct trace_key *key, 
struct strbuf *buf)
strbuf_release(buf);
 }
 
-static void trace_vprintf(struct trace_key *key, const char *format, va_list 
ap)
+static void trace_vprintf_fl(const char *file, int line, struct trace_key *key,
+const char *format, va_list ap)
 {
struct strbuf buf = STRBUF_INIT;
 
-   if (!prepare_trace_line(key, &buf))
+   if (!prepare_trace_line(file, line, key, &buf))
return;
 
strbuf_vaddf(&buf, format, ap);
print_trace_line(key, &buf);
 }
 
-void trace_argv_printf(const char **argv, const char *format, ...)
+static void trace_argv_vprintf_fl(const char *file, int line,
+ const char **argv, const char *format,
+ va_list ap)
 {
struct strbuf buf = STRBUF_INIT;
-   va_list ap;
 
-   if (!prepare_trace_line(NULL, &buf))
+   if (!prepare_trace_line(file, line, NULL, &buf))
return;
 
-   va_start(ap, format);
strbuf_vaddf(&buf, format, ap);
-   va_end(ap);
 
sq_quote_argv(&buf, argv, 0);
print_trace_line(NULL, &buf);
 }
 
-void trace_strbuf(struct trace_key *key, const struct strbuf *data)
+void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
+const struct strbuf *data)
 {
struct strbuf buf = STRBUF_INIT;
 
-   if (!prepare_trace_line(key, &buf))
+   if (!prepare_trace_line(file, line, key, &buf))
return;
 
strbuf_addbuf(&buf, data);
print_trace_line(key, &buf);
 }
 
+#ifndef HAVE_VARIADIC_MACROS
+
 void trace_printf(const char *format, ...)
 {
va_list ap;
va_start(ap, format);
-   trace_vprintf(NULL, format, ap);
+   trace_vprintf_fl(NULL, 0, NULL, format, ap);
va_end(ap);
 }
 
@@ -171,10 +183,46 @@ void trace_printf_key(struct trace_key *key, const char 
*format, ...)
 {
va_list ap;
va_

[PATCH v8 08/17] trace: disable additional trace output for unit tests

2014-07-11 Thread Karsten Blees
Some unit-tests use trace output to verify internal state, and unstable
output such as timestamps and line numbers are not useful there.

Disable additional trace output if GIT_TRACE_BARE is set.

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 t/test-lib.sh | 4 
 trace.c   | 6 ++
 2 files changed, 10 insertions(+)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 81394c8..e37da5a 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -109,6 +109,10 @@ export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME
 export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
 export EDITOR
 
+# Tests using GIT_TRACE typically don't want  : output
+GIT_TRACE_BARE=1
+export GIT_TRACE_BARE
+
 if test -n "${TEST_GIT_INDEX_VERSION:+isset}"
 then
GIT_INDEX_VERSION="$TEST_GIT_INDEX_VERSION"
diff --git a/trace.c b/trace.c
index 3d02bcc..a194b16 100644
--- a/trace.c
+++ b/trace.c
@@ -87,11 +87,17 @@ static const char err_msg[] = "Could not trace into fd 
given by "
 
 static int prepare_trace_line(struct trace_key *key, struct strbuf *buf)
 {
+   static struct trace_key trace_bare = TRACE_KEY_INIT(BARE);
+
if (!trace_want(key))
return 0;
 
set_try_to_free_routine(NULL);  /* is never reset */
 
+   /* unit tests may want to disable additional trace output */
+   if (trace_want(&trace_bare))
+   return 1;
+
/* add line prefix here */
 
return 1;
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 09/17] trace: add current timestamp to all trace output

2014-07-11 Thread Karsten Blees
This is useful to tell apart trace output of separate test runs.

It can also be used for basic, coarse-grained performance analysis. Note
that the accuracy is tainted by writing to the trace file, and you have to
calculate the deltas yourself (which is next to impossible if multiple
threads or processes are involved).

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 trace.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/trace.c b/trace.c
index a194b16..18e5d93 100644
--- a/trace.c
+++ b/trace.c
@@ -88,6 +88,9 @@ static const char err_msg[] = "Could not trace into fd given 
by "
 static int prepare_trace_line(struct trace_key *key, struct strbuf *buf)
 {
static struct trace_key trace_bare = TRACE_KEY_INIT(BARE);
+   struct timeval tv;
+   struct tm tm;
+   time_t secs;
 
if (!trace_want(key))
return 0;
@@ -98,7 +101,12 @@ static int prepare_trace_line(struct trace_key *key, struct 
strbuf *buf)
if (trace_want(&trace_bare))
return 1;
 
-   /* add line prefix here */
+   /* print current timestamp */
+   gettimeofday(&tv, NULL);
+   secs = tv.tv_sec;
+   localtime_r(&secs, &tm);
+   strbuf_addf(buf, "%02d:%02d:%02d.%06ld ", tm.tm_hour, tm.tm_min,
+   tm.tm_sec, (long) tv.tv_usec);
 
return 1;
 }
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 06/17] sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API

2014-07-11 Thread Karsten Blees
This changes GIT_TRACE_PACK_ACCESS functionality as follows:
 * supports the same options as GIT_TRACE (e.g. printing to stderr)
 * no longer supports relative paths
 * appends to the trace file rather than overwriting

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 Documentation/git.txt |  4 ++--
 sha1_file.c   | 30 --
 2 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 75633e6..9d073f6 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -925,11 +925,11 @@ Unsetting the variable, or setting it to empty, "0" or
 "false" (case insensitive) disables trace messages.
 
 'GIT_TRACE_PACK_ACCESS'::
-   If this variable is set to a path, a file will be created at
-   the given path logging all accesses to any packs. For each
+   Enables trace messages for all accesses to any packs. For each
access, the pack file name and an offset in the pack is
recorded. This may be helpful for troubleshooting some
pack-related performance problems.
+   See 'GIT_TRACE' for available trace output options.
 
 'GIT_TRACE_PACKET'::
Enables trace messages for all packets coming in or out of a
diff --git a/sha1_file.c b/sha1_file.c
index 34d527f..7a110b5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -36,9 +36,6 @@ static inline uintmax_t sz_fmt(size_t s) { return s; }
 
 const unsigned char null_sha1[20];
 
-static const char *no_log_pack_access = "no_log_pack_access";
-static const char *log_pack_access;
-
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want
@@ -2085,27 +2082,9 @@ static void *read_object(const unsigned char *sha1, enum 
object_type *type,
 
 static void write_pack_access_log(struct packed_git *p, off_t obj_offset)
 {
-   static FILE *log_file;
-
-   if (!log_pack_access)
-   log_pack_access = getenv("GIT_TRACE_PACK_ACCESS");
-   if (!log_pack_access)
-   log_pack_access = no_log_pack_access;
-   if (log_pack_access == no_log_pack_access)
-   return;
-
-   if (!log_file) {
-   log_file = fopen(log_pack_access, "w");
-   if (!log_file) {
-   error("cannot open pack access log '%s' for writing: 
%s",
- log_pack_access, strerror(errno));
-   log_pack_access = no_log_pack_access;
-   return;
-   }
-   }
-   fprintf(log_file, "%s %"PRIuMAX"\n",
-   p->pack_name, (uintmax_t)obj_offset);
-   fflush(log_file);
+   static struct trace_key pack_access = TRACE_KEY_INIT(PACK_ACCESS);
+   trace_printf_key(&pack_access, "%s %"PRIuMAX"\n",
+p->pack_name, (uintmax_t)obj_offset);
 }
 
 int do_check_packed_object_crc;
@@ -2130,8 +2109,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
int delta_stack_nr = 0, delta_stack_alloc = UNPACK_ENTRY_STACK_PREALLOC;
int base_from_cache = 0;
 
-   if (log_pack_access != no_log_pack_access)
-   write_pack_access_log(p, obj_offset);
+   write_pack_access_log(p, obj_offset);
 
/* PHASE 1: drill down to the innermost base object */
for (;;) {
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 07/17] trace: add infrastructure to augment trace output with additional info

2014-07-11 Thread Karsten Blees
To be able to add a common prefix or suffix to all trace output (e.g.
a timestamp or file:line of the caller), factor out common setup and
cleanup tasks of the trace* functions.

When adding a common prefix, it makes sense that the output of each trace
call starts on a new line. Add '\n' in case the caller forgot.

Note that this explicitly limits trace output to line-by-line, it is no
longer possible to trace-print just part of a line. Until now, this was
just an implicit assumption (trace-printing part of a line worked, but
messed up the trace file if multiple threads or processes were involved).

Thread-safety / inter-process-safety is also the reason why we need to do
the prefixing and suffixing in memory rather than issuing multiple write()
calls. Write_or_whine_pipe() / xwrite() is atomic unless the size exceeds
MAX_IO_SIZE (8MB, see wrapper.c). In case of trace_strbuf, this costs an
additional string copy (which should be irrelevant for performance in light
of actual file IO).

While we're at it, rename trace_strbuf's 'buf' argument, which suggests
that the function is modifying the buffer. Trace_strbuf() currently is the
only trace API that can print arbitrary binary data (without barfing on
'%' or stopping at '\0'), so 'data' seems more appropriate.

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 trace.c | 47 +--
 trace.h |  2 +-
 2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/trace.c b/trace.c
index 8662b79..3d02bcc 100644
--- a/trace.c
+++ b/trace.c
@@ -85,17 +85,37 @@ void trace_disable(struct trace_key *key)
 static const char err_msg[] = "Could not trace into fd given by "
"GIT_TRACE environment variable";
 
+static int prepare_trace_line(struct trace_key *key, struct strbuf *buf)
+{
+   if (!trace_want(key))
+   return 0;
+
+   set_try_to_free_routine(NULL);  /* is never reset */
+
+   /* add line prefix here */
+
+   return 1;
+}
+
+static void print_trace_line(struct trace_key *key, struct strbuf *buf)
+{
+   /* append newline if missing */
+   if (buf->len && buf->buf[buf->len - 1] != '\n')
+   strbuf_addch(buf, '\n');
+
+   write_or_whine_pipe(get_trace_fd(key), buf->buf, buf->len, err_msg);
+   strbuf_release(buf);
+}
+
 static void trace_vprintf(struct trace_key *key, const char *format, va_list 
ap)
 {
struct strbuf buf = STRBUF_INIT;
 
-   if (!trace_want(key))
+   if (!prepare_trace_line(key, &buf))
return;
 
-   set_try_to_free_routine(NULL);  /* is never reset */
strbuf_vaddf(&buf, format, ap);
-   trace_strbuf(key, &buf);
-   strbuf_release(&buf);
+   print_trace_line(key, &buf);
 }
 
 void trace_printf_key(struct trace_key *key, const char *format, ...)
@@ -114,32 +134,31 @@ void trace_printf(const char *format, ...)
va_end(ap);
 }
 
-void trace_strbuf(struct trace_key *key, const struct strbuf *buf)
+void trace_strbuf(struct trace_key *key, const struct strbuf *data)
 {
-   int fd = get_trace_fd(key);
-   if (!fd)
+   struct strbuf buf = STRBUF_INIT;
+
+   if (!prepare_trace_line(key, &buf))
return;
 
-   write_or_whine_pipe(fd, buf->buf, buf->len, err_msg);
+   strbuf_addbuf(&buf, data);
+   print_trace_line(key, &buf);
 }
 
 void trace_argv_printf(const char **argv, const char *format, ...)
 {
struct strbuf buf = STRBUF_INIT;
va_list ap;
-   int fd = get_trace_fd(NULL);
-   if (!fd)
+
+   if (!prepare_trace_line(NULL, &buf))
return;
 
-   set_try_to_free_routine(NULL);  /* is never reset */
va_start(ap, format);
strbuf_vaddf(&buf, format, ap);
va_end(ap);
 
sq_quote_argv(&buf, argv, 0);
-   strbuf_addch(&buf, '\n');
-   write_or_whine_pipe(fd, buf.buf, buf.len, err_msg);
-   strbuf_release(&buf);
+   print_trace_line(NULL, &buf);
 }
 
 static const char *quote_crnl(const char *path)
diff --git a/trace.h b/trace.h
index d85ac4c..28c1089 100644
--- a/trace.h
+++ b/trace.h
@@ -22,6 +22,6 @@ extern int trace_want(struct trace_key *key);
 extern void trace_disable(struct trace_key *key);
 __attribute__((format (printf, 2, 3)))
 extern void trace_printf_key(struct trace_key *key, const char *format, ...);
-extern void trace_strbuf(struct trace_key *key, const struct strbuf *buf);
+extern void trace_strbuf(struct trace_key *key, const struct strbuf *data);
 
 #endif /* TRACE_H */
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 05/17] Documentation/git.txt: improve documentation of 'GIT_TRACE*' variables

2014-07-11 Thread Karsten Blees
Separate GIT_TRACE description into what it prints and how to configure
where trace output is printed to. Change other GIT_TRACE_* descriptions to
refer to GIT_TRACE.

Add descriptions for GIT_TRACE_SETUP and GIT_TRACE_SHALLOW.

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 Documentation/git.txt | 50 ++
 1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 3bd68b0..75633e6 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -904,18 +904,25 @@ for further details.
based on whether stdout appears to be redirected to a file or not.
 
 'GIT_TRACE'::
-   If this variable is set to "1", "2" or "true" (comparison
-   is case insensitive), Git will print `trace:` messages on
-   stderr telling about alias expansion, built-in command
-   execution and external command execution.
-   If this variable is set to an integer value greater than 1
-   and lower than 10 (strictly) then Git will interpret this
-   value as an open file descriptor and will try to write the
-   trace messages into this file descriptor.
-   Alternatively, if this variable is set to an absolute path
-   (starting with a '/' character), Git will interpret this
-   as a file path and will try to write the trace messages
-   into it.
+   Enables general trace messages, e.g. alias expansion, built-in
+   command execution and external command execution.
++
+If this variable is set to "1", "2" or "true" (comparison
+is case insensitive), trace messages will be printed to
+stderr.
++
+If the variable is set to an integer value greater than 2
+and lower than 10 (strictly) then Git will interpret this
+value as an open file descriptor and will try to write the
+trace messages into this file descriptor.
++
+Alternatively, if the variable is set to an absolute path
+(starting with a '/' character), Git will interpret this
+as a file path and will try to write the trace messages
+into it.
++
+Unsetting the variable, or setting it to empty, "0" or
+"false" (case insensitive) disables trace messages.
 
 'GIT_TRACE_PACK_ACCESS'::
If this variable is set to a path, a file will be created at
@@ -925,10 +932,21 @@ for further details.
pack-related performance problems.
 
 'GIT_TRACE_PACKET'::
-   If this variable is set, it shows a trace of all packets
-   coming in or out of a given program. This can help with
-   debugging object negotiation or other protocol issues. Tracing
-   is turned off at a packet starting with "PACK".
+   Enables trace messages for all packets coming in or out of a
+   given program. This can help with debugging object negotiation
+   or other protocol issues. Tracing is turned off at a packet
+   starting with "PACK".
+   See 'GIT_TRACE' for available trace output options.
+
+'GIT_TRACE_SETUP'::
+   Enables trace messages printing the .git, working tree and current
+   working directory after Git has completed its setup phase.
+   See 'GIT_TRACE' for available trace output options.
+
+'GIT_TRACE_SHALLOW'::
+   Enables trace messages that can help debugging fetching /
+   cloning of shallow repositories.
+   See 'GIT_TRACE' for available trace output options.
 
 GIT_LITERAL_PATHSPECS::
Setting this variable to `1` will cause Git to treat all
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 04/17] trace: improve trace performance

2014-07-11 Thread Karsten Blees
The trace API currently rechecks the environment variable and reopens the
trace file on every API call. This has the ugly side effect that errors
(e.g. file cannot be opened, or the user specified a relative path) are
also reported on every call. Performance can be improved by about factor
three by remembering the environment state and keeping the file open.

Replace the 'const char *key' parameter in the API with a pointer to a
'struct trace_key' that bundles the environment variable name with
additional, trace-internal state. Change the call sites of these APIs to
use a static 'struct trace_key' instead of a string constant.

In trace.c::get_trace_fd(), save and reuse the file descriptor in 'struct
trace_key'.

Add a 'trace_disable()' API, so that packet_trace() can cleanly disable
tracing when it encounters packed data (instead of using unsetenv()).

Signed-off-by: Karsten Blees 
---
 builtin/receive-pack.c |   2 +-
 commit.h   |   1 +
 pkt-line.c |   8 ++--
 shallow.c  |  10 ++---
 trace.c| 100 ++---
 trace.h|  16 ++--
 6 files changed, 78 insertions(+), 59 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index c323081..1451050 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -438,7 +438,7 @@ static int update_shallow_ref(struct command *cmd, struct 
shallow_info *si)
uint32_t mask = 1 << (cmd->index % 32);
int i;
 
-   trace_printf_key("GIT_TRACE_SHALLOW",
+   trace_printf_key(&trace_shallow,
 "shallow: update_shallow_ref %s\n", cmd->ref_name);
for (i = 0; i < si->shallow->nr; i++)
if (si->used_shallow[i] &&
diff --git a/commit.h b/commit.h
index a9f177b..08ef643 100644
--- a/commit.h
+++ b/commit.h
@@ -235,6 +235,7 @@ extern void assign_shallow_commits_to_refs(struct 
shallow_info *info,
   int *ref_status);
 extern int delayed_reachability_test(struct shallow_info *si, int c);
 extern void prune_shallow(int show_only);
+extern struct trace_key trace_shallow;
 
 int is_descendant_of(struct commit *, struct commit_list *);
 int in_merge_bases(struct commit *, struct commit *);
diff --git a/pkt-line.c b/pkt-line.c
index bc63b3b..8bc89b1 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -3,7 +3,7 @@
 
 char packet_buffer[LARGE_PACKET_MAX];
 static const char *packet_trace_prefix = "git";
-static const char trace_key[] = "GIT_TRACE_PACKET";
+static struct trace_key trace_packet = TRACE_KEY_INIT(PACKET);
 
 void packet_trace_identity(const char *prog)
 {
@@ -15,7 +15,7 @@ static void packet_trace(const char *buf, unsigned int len, 
int write)
int i;
struct strbuf out;
 
-   if (!trace_want(trace_key))
+   if (!trace_want(&trace_packet))
return;
 
/* +32 is just a guess for header + quoting */
@@ -27,7 +27,7 @@ static void packet_trace(const char *buf, unsigned int len, 
int write)
if ((len >= 4 && starts_with(buf, "PACK")) ||
(len >= 5 && starts_with(buf+1, "PACK"))) {
strbuf_addstr(&out, "PACK ...");
-   unsetenv(trace_key);
+   trace_disable(&trace_packet);
}
else {
/* XXX we should really handle printable utf8 */
@@ -43,7 +43,7 @@ static void packet_trace(const char *buf, unsigned int len, 
int write)
}
 
strbuf_addch(&out, '\n');
-   trace_strbuf(trace_key, &out);
+   trace_strbuf(&trace_packet, &out);
strbuf_release(&out);
 }
 
diff --git a/shallow.c b/shallow.c
index 0b267b6..de07709 100644
--- a/shallow.c
+++ b/shallow.c
@@ -325,7 +325,7 @@ void prune_shallow(int show_only)
strbuf_release(&sb);
 }
 
-#define TRACE_KEY "GIT_TRACE_SHALLOW"
+struct trace_key trace_shallow = TRACE_KEY_INIT(SHALLOW);
 
 /*
  * Step 1, split sender shallow commits into "ours" and "theirs"
@@ -334,7 +334,7 @@ void prune_shallow(int show_only)
 void prepare_shallow_info(struct shallow_info *info, struct sha1_array *sa)
 {
int i;
-   trace_printf_key(TRACE_KEY, "shallow: prepare_shallow_info\n");
+   trace_printf_key(&trace_shallow, "shallow: prepare_shallow_info\n");
memset(info, 0, sizeof(*info));
info->shallow = sa;
if (!sa)
@@ -365,7 +365,7 @@ void remove_nonexistent_theirs_shallow(struct shallow_info 
*info)
 {
unsigned char (*sha1)[20] = info->shallow->sha1;
int i, dst;
-   trace_printf_key(TRACE_KEY, "shallow: 
remove_nonexistent_theirs_shallow\n");
+   trace_printf_key(&trace_shallow, "shallow: 
remove_nonexis

[PATCH v8 03/17] trace: remove redundant printf format attribute

2014-07-11 Thread Karsten Blees
trace_printf_key() is the only non-static function that duplicates the
printf format attribute in the .c file, remove it for consistency.

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 trace.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/trace.c b/trace.c
index 37a7fa9..3e31558 100644
--- a/trace.c
+++ b/trace.c
@@ -75,7 +75,6 @@ static void trace_vprintf(const char *key, const char 
*format, va_list ap)
strbuf_release(&buf);
 }
 
-__attribute__((format (printf, 2, 3)))
 void trace_printf_key(const char *key, const char *format, ...)
 {
va_list ap;
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 02/17] trace: consistently name the format parameter

2014-07-11 Thread Karsten Blees
The format parameter to trace_printf functions is sometimes abbreviated
'fmt'. Rename to 'format' everywhere (consistent with POSIX' printf
specification).

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 trace.c | 22 +++---
 trace.h |  2 +-
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/trace.c b/trace.c
index 08180a9..37a7fa9 100644
--- a/trace.c
+++ b/trace.c
@@ -62,7 +62,7 @@ static int get_trace_fd(const char *key, int *need_close)
 static const char err_msg[] = "Could not trace into fd given by "
"GIT_TRACE environment variable";
 
-static void trace_vprintf(const char *key, const char *fmt, va_list ap)
+static void trace_vprintf(const char *key, const char *format, va_list ap)
 {
struct strbuf buf = STRBUF_INIT;
 
@@ -70,25 +70,25 @@ static void trace_vprintf(const char *key, const char *fmt, 
va_list ap)
return;
 
set_try_to_free_routine(NULL);  /* is never reset */
-   strbuf_vaddf(&buf, fmt, ap);
+   strbuf_vaddf(&buf, format, ap);
trace_strbuf(key, &buf);
strbuf_release(&buf);
 }
 
 __attribute__((format (printf, 2, 3)))
-void trace_printf_key(const char *key, const char *fmt, ...)
+void trace_printf_key(const char *key, const char *format, ...)
 {
va_list ap;
-   va_start(ap, fmt);
-   trace_vprintf(key, fmt, ap);
+   va_start(ap, format);
+   trace_vprintf(key, format, ap);
va_end(ap);
 }
 
-void trace_printf(const char *fmt, ...)
+void trace_printf(const char *format, ...)
 {
va_list ap;
-   va_start(ap, fmt);
-   trace_vprintf("GIT_TRACE", fmt, ap);
+   va_start(ap, format);
+   trace_vprintf("GIT_TRACE", format, ap);
va_end(ap);
 }
 
@@ -106,7 +106,7 @@ void trace_strbuf(const char *key, const struct strbuf *buf)
close(fd);
 }
 
-void trace_argv_printf(const char **argv, const char *fmt, ...)
+void trace_argv_printf(const char **argv, const char *format, ...)
 {
struct strbuf buf = STRBUF_INIT;
va_list ap;
@@ -117,8 +117,8 @@ void trace_argv_printf(const char **argv, const char *fmt, 
...)
return;
 
set_try_to_free_routine(NULL);  /* is never reset */
-   va_start(ap, fmt);
-   strbuf_vaddf(&buf, fmt, ap);
+   va_start(ap, format);
+   strbuf_vaddf(&buf, format, ap);
va_end(ap);
 
sq_quote_argv(&buf, argv, 0);
diff --git a/trace.h b/trace.h
index 6cc4541..8fea50b 100644
--- a/trace.h
+++ b/trace.h
@@ -11,7 +11,7 @@ extern void trace_argv_printf(const char **argv, const char 
*format, ...);
 extern void trace_repo_setup(const char *prefix);
 extern int trace_want(const char *key);
 __attribute__((format (printf, 2, 3)))
-extern void trace_printf_key(const char *key, const char *fmt, ...);
+extern void trace_printf_key(const char *key, const char *format, ...);
 extern void trace_strbuf(const char *key, const struct strbuf *buf);
 
 #endif /* TRACE_H */
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 01/17] trace: move trace declarations from cache.h to new trace.h

2014-07-11 Thread Karsten Blees
Also include direct dependencies (strbuf.h and git-compat-util.h for
__attribute__) so that trace.h can be used independently of cache.h, e.g.
in test programs.

Signed-off-by: Karsten Blees 
Signed-off-by: Junio C Hamano 
---
 cache.h | 13 ++---
 trace.h | 17 +
 2 files changed, 19 insertions(+), 11 deletions(-)
 create mode 100644 trace.h

diff --git a/cache.h b/cache.h
index cbe1935..466f6b3 100644
--- a/cache.h
+++ b/cache.h
@@ -7,6 +7,7 @@
 #include "advice.h"
 #include "gettext.h"
 #include "convert.h"
+#include "trace.h"
 
 #include SHA1_HEADER
 #ifndef git_SHA_CTX
@@ -1377,17 +1378,7 @@ extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
 
-/* trace.c */
-__attribute__((format (printf, 1, 2)))
-extern void trace_printf(const char *format, ...);
-__attribute__((format (printf, 2, 3)))
-extern void trace_argv_printf(const char **argv, const char *format, ...);
-extern void trace_repo_setup(const char *prefix);
-extern int trace_want(const char *key);
-__attribute__((format (printf, 2, 3)))
-extern void trace_printf_key(const char *key, const char *fmt, ...);
-extern void trace_strbuf(const char *key, const struct strbuf *buf);
-
+/* pkt-line.c */
 void packet_trace_identity(const char *prog);
 
 /* add */
diff --git a/trace.h b/trace.h
new file mode 100644
index 000..6cc4541
--- /dev/null
+++ b/trace.h
@@ -0,0 +1,17 @@
+#ifndef TRACE_H
+#define TRACE_H
+
+#include "git-compat-util.h"
+#include "strbuf.h"
+
+__attribute__((format (printf, 1, 2)))
+extern void trace_printf(const char *format, ...);
+__attribute__((format (printf, 2, 3)))
+extern void trace_argv_printf(const char **argv, const char *format, ...);
+extern void trace_repo_setup(const char *prefix);
+extern int trace_want(const char *key);
+__attribute__((format (printf, 2, 3)))
+extern void trace_printf_key(const char *key, const char *fmt, ...);
+extern void trace_strbuf(const char *key, const struct strbuf *buf);
+
+#endif /* TRACE_H */
-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 00/17] add performance tracing facility

2014-07-11 Thread Karsten Blees
Changes since v7:
[04]: Fixed -Wextra compiler warnings, thanks to Ramsay Jones.
[11]: Added #ifndef TRACE_CONTEXT, explained why __FILE__ ":"
  __FUNCTION__ doesn't work.
[17]: New Documentation/technical/api-trace.txt


Karsten Blees (17):
  trace: move trace declarations from cache.h to new trace.h
  trace: consistently name the format parameter
  trace: remove redundant printf format attribute
  trace: improve trace performance
  Documentation/git.txt: improve documentation of 'GIT_TRACE*' variables
  sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API
  trace: add infrastructure to augment trace output with additional info
  trace: disable additional trace output for unit tests
  trace: add current timestamp to all trace output
  trace: move code around, in preparation to file:line output
  trace: add 'file:line' to all trace output
  trace: add high resolution timer function to debug performance issues
  trace: add trace_performance facility to debug performance issues
  git: add performance tracing for git's main() function to debug
scripts
  wt-status: simplify performance measurement by using getnanotime()
  progress: simplify performance measurement by using getnanotime()
  api-trace.txt: add trace API documentation

 Documentation/git.txt |  59 --
 Documentation/technical/api-trace.txt |  97 +
 Makefile  |   7 +
 builtin/receive-pack.c|   2 +-
 cache.h   |  13 +-
 commit.h  |   1 +
 config.mak.uname  |   1 +
 git-compat-util.h |   4 +
 git.c |   2 +
 pkt-line.c|   8 +-
 progress.c|  71 +++
 sha1_file.c   |  30 +--
 shallow.c |  10 +-
 t/test-lib.sh |   4 +
 trace.c   | 369 --
 trace.h   | 113 +++
 wt-status.c   |  14 +-
 17 files changed, 629 insertions(+), 176 deletions(-)
 create mode 100644 Documentation/technical/api-trace.txt
 create mode 100644 trace.h

-- 
2.0.0.406.g2e9ef9b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dir: remove PATH_MAX limitation

2014-07-11 Thread Karsten Blees
Am 12.07.2014 00:29, schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> Anyways, I'd like to kindly withdraw this patch in favor of Duy's version.
>>
>> http://article.gmane.org/gmane.comp.version-control.git/248310
> 
> Thanks; I've already reverted it from 'next'.
> 
> Is Duy's patch still viable?
> 

I think so. It fixes the segfault with long paths on Windows as well
(Tested-by: ), uses strbuf APIs as Peff suggested, and initializes the
strbuf with PATH_MAX (i.e. no reallocs in the common case either ;-) ).

AFAICT the first two patches of that series are also completely unrelated
to the untracked-cache, so we may want to fast-track these?

[01/20] dir.c: coding style fix
[02/20] dir.h: move struct exclude declaration to top level
[03/20] prep_exclude: remove the artificial PATH_MAX limit

...perhaps with s/if (!dir->basebuf.alloc)/if (!dir->basebuf.buf)/

@Duy any reason for not signing off that series?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fixup! symlinks: remove PATH_MAX limitation

2014-07-11 Thread Karsten Blees
Rename cache_def_free to cache_def_clear as it doesn't free the struct
cache_def, but just clears its content.

Signed-off-by: Karsten Blees 
---
 cache.h | 2 +-
 preload-index.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 44aa439..378ee7f 100644
--- a/cache.h
+++ b/cache.h
@@ -1096,7 +1096,7 @@ struct cache_def {
int prefix_len_stat_func;
 };
 #define CACHE_DEF_INIT { STRBUF_INIT, 0, 0, 0 }
-static inline void cache_def_free(struct cache_def *cache)
+static inline void cache_def_clear(struct cache_def *cache)
 {
strbuf_release(&cache->path);
 }
diff --git a/preload-index.c b/preload-index.c
index 79ce8a9..c1fe3a3 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -63,7 +63,7 @@ static void *preload_thread(void *_data)
continue;
ce_mark_uptodate(ce);
} while (--nr > 0);
-   cache_def_free(&cache);
+   cache_def_clear(&cache);
return NULL;
 }
 
-- 
2.0.1.563.g66f467c.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Topic sk/mingw-unicode-spawn-args breaks tests

2014-07-11 Thread Karsten Blees
Am 10.07.2014 22:05, schrieb Johannes Sixt:
> It looks like I totally missed the topic sk/mingw-unicode-spawn-args.
> Now it's in master, and it breaks lots of test cases for me:
> 
> t0050-filesystem
> t0110-urlmatch-normalization
> t4014-format-patch
> t4041-diff-submodule-option
> t4120-apply-popt
> t4201-shortlog
> t4205-log-pretty-formats
> t4209-log-pickaxe
> t4210-log-i18n
> (I killed the test run here)
> 
> Am I doing something wrong? Does the topic depend on a particular
> version of MSYS (or DLL)?
> 
> -- Hannes
> 

After commenting out fchmod in config.c, I get similar results.

At first glance, t0050 seems to fail because the unicode file
name patches are still missing.

t4041 tries to pass ISO-8859-1 encoded bytes on the command line,
which simply doesn't work on Windows (all OS APIs 'talk' UTF-16).
We have a fix for this in the msysgit fork [1] (but unfortunately
in another branch, so Stepan couldn't know the patch is related).

I suspect the other failures also fall in these two categories.

[1] https://github.com/msysgit/git/commit/ef4a733c

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dir: remove PATH_MAX limitation

2014-07-11 Thread Karsten Blees
Am 09.07.2014 18:33, schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> 'git status' segfaults if a directory is longer than PATH_MAX, because
>> processing .gitignore files in prep_exclude() writes past the end of a
>> PATH_MAX-bounded buffer.
>>
>> Remove the limitation by using strbuf instead.
>>
>> Note: this fix just 'abuses' strbuf as string allocator, len is always 0.
>> prep_exclude() can probably be simplified using more strbuf APIs.
> 
> In addition to what Jeff already said, I am afraid that this may
> make things a lot worse for normal case.  By sizing the strbuf to
> absolute minimum as you dig deeper at each level, wouldn't you copy
> and reallocate the buffer a lot more, compared to the case where
> everything fits in the original buffer?
> 

strbuf uses ALLOC_GROW, which resizes in (x+16)*1.5 steps (i.e. 24, 60, 114,
195, 316...). Max path len in git is 85, and linux and WebKit are 170ish. I
don't think three or four extra reallocs per directory traversal will be
noticeable.

Anyways, I'd like to kindly withdraw this patch in favor of Duy's version.

http://article.gmane.org/gmane.comp.version-control.git/248310
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] symlinks: remove PATH_MAX limitation

2014-07-11 Thread Karsten Blees
Am 07.07.2014 20:30, schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
> The above cache_def_free(cache) does not free the cache itself, but
> only its associated data, so the name cache_def_free() is somewhat
> misleading.
> 

You already merged this to master ("kb/path-max-must-go" lol), should
I send a fixup! s/cache_def_free/cache_def_clear/ or is it OK as is?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 3/4] hashmap: add simplified hashmap_get_from_hash() API

2014-07-11 Thread Karsten Blees
Am 07.07.2014 19:43, schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> Hashmap entries are typically looked up by just a key. The hashmap_get()
>> API expects an initialized entry structure instead, to support compound
>> keys. This flexibility is currently only needed by find_dir_entry() in
>> name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other
>> (currently five) call sites of hashmap_get() have to set up a near emtpy
> 
> s/emtpy/empty/;
> 
>> entry structure, resulting in duplicate code like this:
>>
>>   struct hashmap_entry keyentry;
>>   hashmap_entry_init(&keyentry, hash(key));
>>   return hashmap_get(map, &keyentry, key);
>>
>> Add a hashmap_get_from_hash() API that allows hashmap lookups by just
>> specifying the key and its hash code, i.e.:
>>
>>   return hashmap_get_from_hash(map, hash(key), key);
> 
> While I think the *_get_from_hash() is an improvement over the
> three-line sequence, I find it somewhat strange that callers of it
> still must compute the hash code themselves, instead of letting the
> hashmap itself to apply the user supplied hash function to the key
> to derive it.  After all, the map must know how to compare two
> entries via a user-supplied cmpfn given at the map initialization
> time, and the algorithm to derive the hash code falls into the same
> category, in the sense that both must stay the same during the
> lifetime of a hashmap, no?  Unless there is a situation where you
> need to be able to initialize a hashmap_entry without knowing which
> hashmap the entry will eventually be stored, it feels dubious API
> that exposes to the outside callers hashmap_entry_init() that takes
> the hash code without taking the hashmap itself.
> 
> In other words, why isn't hashmap_get() more like this:
> 
> void *hashmap_get(const struct hashmap *map, const void *key)
> {
> struct hashmap_entry keyentry;
> hashmap_entry_init(&keyentry, map->hash(key));
> return *find_entry_ptr(map, &keyentry, key);
> }
> 
> with hashmap_entry_init() purely a static helper in hashmap.c?
> 

1. Performance

Calculating hash codes is the most expensive operation when working with
hash tables (unless you already have a good hash such as sha1). And using
hash tables as a cache of some sort is probably the most common use case.
So you'll often have code like this:

1  unsigned int h = hash(key);
2  struct my_entry *e = hashmap_get_from_hash(&map, hash, key);
3  if (!e) {
4e = malloc(sizeof(*e));
5hashmap_entry_init(e, h);
6e->key = key;
6hashmap_add(&map, e);
7  }

Note that the hash code from line 1 can be reused in line 5. You cannot do
that if calculating the hash code is buried in the implementation.

Callbacks cannot be inlined either...


2. Simplicity

Most APIs take a user defined hashmap_entry structure, so you'd either need
two hash functions, or a hash function that can distinguish between the
'entry' and 'key-only' case, e.g.

  unsigned int my_hash(const struct my_entry *entry, const void *keydata)
  {
if (keydata)
  return strhash(keydata);
else
  return strhash(entry->key);
  }

Hashmap clients will typically provide small, type safe wrappers around the
hashmap API. That's perfect for calculating the hash code without breaking
encapsulation. IMO there's no need to make things more complex by requiring
another callback.


3. Compound keys

The key doesn't always consist of just a single word. E.g. for struct
dir_entry, the key is [char *name, int len]. So an API like this:

  void *hashmap_get(const struct hashmap *map, const void *key)

won't do in the general case (unless you require clients to define their
own key structure in addition to the entry structure...).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dir: remove PATH_MAX limitation

2014-07-11 Thread Karsten Blees
Am 05.07.2014 12:48, schrieb Duy Nguyen:
> On Sat, Jul 5, 2014 at 5:42 AM, Karsten Blees  wrote:
>> 'git status' segfaults if a directory is longer than PATH_MAX, because
>> processing .gitignore files in prep_exclude() writes past the end of a
>> PATH_MAX-bounded buffer.
>>
>> Remove the limitation by using strbuf instead.
>>
>> Note: this fix just 'abuses' strbuf as string allocator, len is always 0.
>> prep_exclude() can probably be simplified using more strbuf APIs.
> 
> FYI I had a similar patch [1] that attempted to lazily strbuf_init()
> instead so that strbuf_ API could be used.
> 
> [1] http://article.gmane.org/gmane.comp.version-control.git/248310
> 

Sorry, I missed that one.

In my version, strbuf_grow() does the lazy initialization (in fact, many
strbuf_* functions can handle memset(0) strbufs just fine).

I was simply too lazy to understand (again) how prep_exclude works exactly, and
as string calculations like that have the tendency to be just 1 char off, I went
for the obviously correct solution (i.e. s/dir->basebuf/dir->base.buf/ plus
strbuf_grow() before we write the buffer).

But IMO your version is much cleaner already.

However, api-strbuf.txt says that buf != NULL is invariant after init, and
alloc is "somehow private" :-) , so perhaps you should

-   if (!dir->basebuf.alloc)
+   if (!dir->basebuf.buf)
strbuf_init(&dir->basebuf, PATH_MAX);

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] dir: remove PATH_MAX limitation

2014-07-04 Thread Karsten Blees
'git status' segfaults if a directory is longer than PATH_MAX, because
processing .gitignore files in prep_exclude() writes past the end of a
PATH_MAX-bounded buffer.

Remove the limitation by using strbuf instead.

Note: this fix just 'abuses' strbuf as string allocator, len is always 0.
prep_exclude() can probably be simplified using more strbuf APIs.

Signed-off-by: Karsten Blees 
---
 dir.c | 35 +++
 dir.h |  4 ++--
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/dir.c b/dir.c
index e65888d..8d4d83c 100644
--- a/dir.c
+++ b/dir.c
@@ -798,7 +798,7 @@ static void prep_exclude(struct dir_struct *dir, const char 
*base, int baselen)
 * path being checked. */
while ((stk = dir->exclude_stack) != NULL) {
if (stk->baselen <= baselen &&
-   !strncmp(dir->basebuf, base, stk->baselen))
+   !strncmp(dir->base.buf, base, stk->baselen))
break;
el = &group->el[dir->exclude_stack->exclude_ix];
dir->exclude_stack = stk->prev;
@@ -833,48 +833,50 @@ static void prep_exclude(struct dir_struct *dir, const 
char *base, int baselen)
stk->baselen = cp - base;
stk->exclude_ix = group->nr;
el = add_exclude_list(dir, EXC_DIRS, NULL);
-   memcpy(dir->basebuf + current, base + current,
+   strbuf_grow(&dir->base, stk->baselen);
+   memcpy(dir->base.buf + current, base + current,
   stk->baselen - current);
 
/* Abort if the directory is excluded */
if (stk->baselen) {
int dt = DT_DIR;
-   dir->basebuf[stk->baselen - 1] = 0;
+   dir->base.buf[stk->baselen - 1] = 0;
dir->exclude = last_exclude_matching_from_lists(dir,
-   dir->basebuf, stk->baselen - 1,
-   dir->basebuf + current, &dt);
-   dir->basebuf[stk->baselen - 1] = '/';
+   dir->base.buf, stk->baselen - 1,
+   dir->base.buf + current, &dt);
+   dir->base.buf[stk->baselen - 1] = '/';
if (dir->exclude &&
dir->exclude->flags & EXC_FLAG_NEGATIVE)
dir->exclude = NULL;
if (dir->exclude) {
-   dir->basebuf[stk->baselen] = 0;
+   dir->base.buf[stk->baselen] = 0;
dir->exclude_stack = stk;
return;
}
}
 
-   /* Try to read per-directory file unless path is too long */
-   if (dir->exclude_per_dir &&
-   stk->baselen + strlen(dir->exclude_per_dir) < PATH_MAX) {
-   strcpy(dir->basebuf + stk->baselen,
+   /* Try to read per-directory file */
+   if (dir->exclude_per_dir) {
+   strbuf_grow(&dir->base, stk->baselen +
+   strlen(dir->exclude_per_dir));
+   strcpy(dir->base.buf + stk->baselen,
dir->exclude_per_dir);
/*
-* dir->basebuf gets reused by the traversal, but we
+* dir->base gets reused by the traversal, but we
 * need fname to remain unchanged to ensure the src
 * member of each struct exclude correctly
 * back-references its source file.  Other invocations
 * of add_exclude_list provide stable strings, so we
 * strdup() and free() here in the caller.
 */
-   el->src = strdup(dir->basebuf);
-   add_excludes_from_file_to_list(dir->basebuf,
-   dir->basebuf, stk->baselen, el, 1);
+   el->src = strdup(dir->base.buf);
+   add_excludes_from_file_to_list(dir->base.buf,
+   dir->base.buf, stk->baselen, el, 1);
}
dir->exclude_stack = stk;
current = stk->baselen;
}
-   dir->basebuf[baselen] = '\0';
+   dir->base.buf[baselen] = '\0';
 }
 
 /*
@@ -1671,4 +1673,5 @@ void clear_directory(struct dir_struct *dir)
free(stk);
  

[PATCH 1/2] symlinks: remove PATH_MAX limitation

2014-07-04 Thread Karsten Blees
'git checkout' fails if a directory is longer than PATH_MAX, because the
lstat_cache in symlinks.c checks if the leading directory exists using
PATH_MAX-bounded string operations.

Remove the limitation by using strbuf instead.

Signed-off-by: Karsten Blees 
---

This fixes a bug on Windows with long paths [1].

[1] https://github.com/msysgit/msysgit/issues/227

 cache.h |  8 ++--
 preload-index.c |  4 ++--
 symlinks.c  | 63 +
 3 files changed, 36 insertions(+), 39 deletions(-)

diff --git a/cache.h b/cache.h
index df65231..44aa439 100644
--- a/cache.h
+++ b/cache.h
@@ -1090,12 +1090,16 @@ struct checkout {
 extern int checkout_entry(struct cache_entry *ce, const struct checkout 
*state, char *topath);
 
 struct cache_def {
-   char path[PATH_MAX + 1];
-   int len;
+   struct strbuf path;
int flags;
int track_flags;
int prefix_len_stat_func;
 };
+#define CACHE_DEF_INIT { STRBUF_INIT, 0, 0, 0 }
+static inline void cache_def_free(struct cache_def *cache)
+{
+   strbuf_release(&cache->path);
+}
 
 extern int has_symlink_leading_path(const char *name, int len);
 extern int threaded_has_symlink_leading_path(struct cache_def *, const char *, 
int);
diff --git a/preload-index.c b/preload-index.c
index 968ee25..79ce8a9 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -37,9 +37,8 @@ static void *preload_thread(void *_data)
struct thread_data *p = _data;
struct index_state *index = p->index;
struct cache_entry **cep = index->cache + p->offset;
-   struct cache_def cache;
+   struct cache_def cache = CACHE_DEF_INIT;
 
-   memset(&cache, 0, sizeof(cache));
nr = p->nr;
if (nr + p->offset > index->cache_nr)
nr = index->cache_nr - p->offset;
@@ -64,6 +63,7 @@ static void *preload_thread(void *_data)
continue;
ce_mark_uptodate(ce);
} while (--nr > 0);
+   cache_def_free(&cache);
return NULL;
 }
 
diff --git a/symlinks.c b/symlinks.c
index c2b41a8..5261e8c 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -35,12 +35,11 @@ static int longest_path_match(const char *name_a, int len_a,
return match_len;
 }
 
-static struct cache_def default_cache;
+static struct cache_def default_cache = CACHE_DEF_INIT;
 
 static inline void reset_lstat_cache(struct cache_def *cache)
 {
-   cache->path[0] = '\0';
-   cache->len = 0;
+   strbuf_reset(&cache->path);
cache->flags = 0;
/*
 * The track_flags and prefix_len_stat_func members is only
@@ -73,7 +72,7 @@ static int lstat_cache_matchlen(struct cache_def *cache,
int prefix_len_stat_func)
 {
int match_len, last_slash, last_slash_dir, previous_slash;
-   int save_flags, max_len, ret;
+   int save_flags, ret;
struct stat st;
 
if (cache->track_flags != track_flags ||
@@ -93,14 +92,14 @@ static int lstat_cache_matchlen(struct cache_def *cache,
 * the 2 "excluding" path types.
 */
match_len = last_slash =
-   longest_path_match(name, len, cache->path, cache->len,
-  &previous_slash);
+   longest_path_match(name, len, cache->path.buf,
+  cache->path.len, &previous_slash);
*ret_flags = cache->flags & track_flags & (FL_NOENT|FL_SYMLINK);
 
if (!(track_flags & FL_FULLPATH) && match_len == len)
match_len = last_slash = previous_slash;
 
-   if (*ret_flags && match_len == cache->len)
+   if (*ret_flags && match_len == cache->path.len)
return match_len;
/*
 * If we now have match_len > 0, we would know that
@@ -121,21 +120,22 @@ static int lstat_cache_matchlen(struct cache_def *cache,
 */
*ret_flags = FL_DIR;
last_slash_dir = last_slash;
-   max_len = len < PATH_MAX ? len : PATH_MAX;
-   while (match_len < max_len) {
+   if (len > cache->path.len)
+   strbuf_grow(&cache->path, len - cache->path.len);
+   while (match_len < len) {
do {
-   cache->path[match_len] = name[match_len];
+   cache->path.buf[match_len] = name[match_len];
match_len++;
-   } while (match_len < max_len && name[match_len] != '/');
-   if (match_len >= max_len && !(track_flags & FL_FULLPATH))
+   } while (match_len < len && name[match_len] != '/');
+   if (mat

[PATCH v1 3/4] hashmap: add simplified hashmap_get_from_hash() API

2014-07-02 Thread Karsten Blees
Hashmap entries are typically looked up by just a key. The hashmap_get()
API expects an initialized entry structure instead, to support compound
keys. This flexibility is currently only needed by find_dir_entry() in
name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other
(currently five) call sites of hashmap_get() have to set up a near emtpy
entry structure, resulting in duplicate code like this:

  struct hashmap_entry keyentry;
  hashmap_entry_init(&keyentry, hash(key));
  return hashmap_get(map, &keyentry, key);

Add a hashmap_get_from_hash() API that allows hashmap lookups by just
specifying the key and its hash code, i.e.:

  return hashmap_get_from_hash(map, hash(key), key);

Signed-off-by: Karsten Blees 
---
 Documentation/technical/api-hashmap.txt | 14 ++
 builtin/describe.c  |  4 +---
 diffcore-rename.c   |  7 +++
 hashmap.h   |  8 
 name-hash.c |  5 ++---
 test-hashmap.c  | 11 +++
 6 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/Documentation/technical/api-hashmap.txt 
b/Documentation/technical/api-hashmap.txt
index dc21a7c..f9215d6 100644
--- a/Documentation/technical/api-hashmap.txt
+++ b/Documentation/technical/api-hashmap.txt
@@ -118,6 +118,20 @@ hashmap_entry) that has at least been initialized with the 
proper hash code
 If an entry with matching hash code is found, `key` and `keydata` are passed
 to `hashmap_cmp_fn` to decide whether the entry matches the key.
 
+`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, 
const void *keydata)`::
+
+   Returns the hashmap entry for the specified hash code and key data,
+   or NULL if not found.
++
+`map` is the hashmap structure.
++
+`hash` is the hash code of the entry to look up.
++
+If an entry with matching hash code is found, `keydata` is passed to
+`hashmap_cmp_fn` to decide whether the entry matches the key. The
+`entry_or_key` parameter points to a bogus hashmap_entry structure that
+should not be used in the comparison.
+
 `void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
 
Returns the next equal hashmap entry, or NULL if not found. This can be
diff --git a/builtin/describe.c b/builtin/describe.c
index 57e84c8..ee6a3b9 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -58,9 +58,7 @@ static int commit_name_cmp(const struct commit_name *cn1,
 
 static inline struct commit_name *find_commit_name(const unsigned char *peeled)
 {
-   struct commit_name key;
-   hashmap_entry_init(&key, sha1hash(peeled));
-   return hashmap_get(&names, &key, peeled);
+   return hashmap_get_from_hash(&names, sha1hash(peeled), peeled);
 }
 
 static int replace_name(struct commit_name *e,
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6fa97d4..2e44a37 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -257,15 +257,14 @@ static int find_identical_files(struct hashmap *srcs,
int renames = 0;
 
struct diff_filespec *target = rename_dst[dst_index].two;
-   struct file_similarity *p, *best, dst;
+   struct file_similarity *p, *best = NULL;
int i = 100, best_score = -1;
 
/*
 * Find the best source match for specified destination.
 */
-   best = NULL;
-   hashmap_entry_init(&dst, hash_filespec(target));
-   for (p = hashmap_get(srcs, &dst, NULL); p; p = hashmap_get_next(srcs, 
p)) {
+   p = hashmap_get_from_hash(srcs, hash_filespec(target), NULL);
+   for (; p; p = hashmap_get_next(srcs, p)) {
int score;
struct diff_filespec *source = p->filespec;
 
diff --git a/hashmap.h b/hashmap.h
index ed5425a..12f0668 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -68,6 +68,14 @@ extern void *hashmap_put(struct hashmap *map, void *entry);
 extern void *hashmap_remove(struct hashmap *map, const void *key,
const void *keydata);
 
+static inline void *hashmap_get_from_hash(const struct hashmap *map,
+   unsigned int hash, const void *keydata)
+{
+   struct hashmap_entry key;
+   hashmap_entry_init(&key, hash);
+   return hashmap_get(map, &key, keydata);
+}
+
 /* hashmap_iter functions */
 
 extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter);
diff --git a/name-hash.c b/name-hash.c
index 49fd508..702cd05 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -213,12 +213,11 @@ struct cache_entry *index_dir_exists(struct index_state 
*istate, const char *nam
 struct cache_entry *index_file_exists(struct index_state *istate, const char 
*name, int namelen, int icase)
 {
struct cache_entry *ce;
-   struct hashmap_entry key;
 
lazy_init_name_hash(istate);
 
-   hashmap_entry_init(&key, memihash(name, namelen));
-   ce = hashmap_get(&istate->name_hash, &key, NU

[PATCH v1 4/4] hashmap: add string interning API

2014-07-02 Thread Karsten Blees
Interning short strings with high probability of duplicates can reduce the
memory footprint and speed up comparisons.

Add strintern() and memintern() APIs that use a hashmap to manage the pool
of unique, interned strings.

Note: strintern(getenv()) could be used to sanitize git's use of getenv(),
in case we ever encounter a platform where a call to getenv() invalidates
previous getenv() results (which is allowed by POSIX).

Signed-off-by: Karsten Blees 
---
 Documentation/technical/api-hashmap.txt | 15 +
 hashmap.c   | 38 +
 hashmap.h   |  8 +++
 t/t0011-hashmap.sh  | 13 +++
 test-hashmap.c  | 14 
 5 files changed, 88 insertions(+)

diff --git a/Documentation/technical/api-hashmap.txt 
b/Documentation/technical/api-hashmap.txt
index f9215d6..00c4c29 100644
--- a/Documentation/technical/api-hashmap.txt
+++ b/Documentation/technical/api-hashmap.txt
@@ -193,6 +193,21 @@ more entries.
 `hashmap_iter_first` is a combination of both (i.e. initializes the iterator
 and returns the first entry, if any).
 
+`const char *strintern(const char *string)`::
+`const void *memintern(const void *data, size_t len)`::
+
+   Returns the unique, interned version of the specified string or data,
+   similar to the `String.intern` API in Java and .NET, respectively.
+   Interned strings remain valid for the entire lifetime of the process.
++
+Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
+strings / data must not be modified or freed.
++
+Interned strings are best used for short strings with high probability of
+duplicates.
++
+Uses a hashmap to store the pool of interned strings.
+
 Usage example
 -
 
diff --git a/hashmap.c b/hashmap.c
index d1b8056..f693839 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -226,3 +226,41 @@ void *hashmap_iter_next(struct hashmap_iter *iter)
current = iter->map->table[iter->tablepos++];
}
 }
+
+struct pool_entry {
+   struct hashmap_entry ent;
+   size_t len;
+   unsigned char data[FLEX_ARRAY];
+};
+
+static int pool_entry_cmp(const struct pool_entry *e1,
+ const struct pool_entry *e2,
+ const unsigned char *keydata)
+{
+   return e1->data != keydata &&
+  (e1->len != e2->len || memcmp(e1->data, keydata, e1->len));
+}
+
+const void *memintern(const void *data, size_t len)
+{
+   static struct hashmap map;
+   struct pool_entry key, *e;
+
+   /* initialize string pool hashmap */
+   if (!map.tablesize)
+   hashmap_init(&map, (hashmap_cmp_fn) pool_entry_cmp, 0);
+
+   /* lookup interned string in pool */
+   hashmap_entry_init(&key, memhash(data, len));
+   key.len = len;
+   e = hashmap_get(&map, &key, data);
+   if (!e) {
+   /* not found: create it */
+   e = xmallocz(sizeof(struct pool_entry) + len);
+   hashmap_entry_init(e, key.ent.hash);
+   e->len = len;
+   memcpy(e->data, data, len);
+   hashmap_add(&map, e);
+   }
+   return e->data;
+}
diff --git a/hashmap.h b/hashmap.h
index 12f0668..507884b 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -87,4 +87,12 @@ static inline void *hashmap_iter_first(struct hashmap *map,
return hashmap_iter_next(iter);
 }
 
+/* string interning */
+
+extern const void *memintern(const void *data, size_t len);
+static inline const char *strintern(const char *string)
+{
+   return memintern(string, strlen(string));
+}
+
 #endif
diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh
index 391e2b6..f97c805 100755
--- a/t/t0011-hashmap.sh
+++ b/t/t0011-hashmap.sh
@@ -237,4 +237,17 @@ test_expect_success 'grow / shrink' '
 
 '
 
+test_expect_success 'string interning' '
+
+test_hashmap "intern value1
+intern Value1
+intern value2
+intern value2
+" "value1
+Value1
+value2
+value2"
+
+'
+
 test_done
diff --git a/test-hashmap.c b/test-hashmap.c
index 3c9f67b..07aa7ec 100644
--- a/test-hashmap.c
+++ b/test-hashmap.c
@@ -234,6 +234,20 @@ int main(int argc, char *argv[])
/* print table sizes */
printf("%u %u\n", map.tablesize, map.size);
 
+   } else if (!strcmp("intern", cmd) && l1) {
+
+   /* test that strintern works */
+   const char *i1 = strintern(p1);
+   const char *i2 = strintern(p1);
+   if (strcmp(i1, p1))
+   printf("strintern(%s) returns %s\n", p1, i1);
+   else if (i1 == p1)
+   printf("strintern(%s) returns input poi

[PATCH v1 2/4] hashmap: improve struct hashmap member documentation

2014-07-02 Thread Karsten Blees
Signed-off-by: Karsten Blees 
---
 Documentation/technical/api-hashmap.txt | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/api-hashmap.txt 
b/Documentation/technical/api-hashmap.txt
index 4689968..dc21a7c 100644
--- a/Documentation/technical/api-hashmap.txt
+++ b/Documentation/technical/api-hashmap.txt
@@ -8,11 +8,19 @@ Data Structures
 
 `struct hashmap`::
 
-   The hash table structure.
+   The hash table structure. Members can be used as follows, but should
+   not be modified directly:
 +
-The `size` member keeps track of the total number of entries. The `cmpfn`
-member is a function used to compare two entries for equality. The `table` and
-`tablesize` members store the hash table and its size, respectively.
+The `size` member keeps track of the total number of entries (0 means the
+hashmap is empty).
++
+`tablesize` is the allocated size of the hash table. A non-0 value indicates
+that the hashmap is initialized. It may also be useful for statistical purposes
+(i.e. `size / tablesize` is the current load factor).
++
+`cmpfn` stores the comparison function specified in `hashmap_init()`. In
+advanced scenarios, it may be useful to change this, e.g. to switch between
+case-sensitive and case-insensitive lookup.
 
 `struct hashmap_entry`::
 
-- 
1.9.4.msysgit.0.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 1/4] hashmap: factor out getting an int hash code from a, SHA1

2014-07-02 Thread Karsten Blees
Copying the first bytes of a SHA1 is duplicated in six places, however,
the implications (wrong byte order on little-endian systems) is documented
only once.

Add a properly documented API for this.

Signed-off-by: Karsten Blees 
---
 Documentation/technical/api-hashmap.txt |  9 +
 builtin/describe.c  | 11 ++-
 decorate.c  |  5 +
 diffcore-rename.c   |  4 +---
 hashmap.h   | 11 +++
 khash.h | 11 ++-
 object.c| 13 +
 pack-objects.c  |  5 ++---
 8 files changed, 29 insertions(+), 40 deletions(-)

diff --git a/Documentation/technical/api-hashmap.txt 
b/Documentation/technical/api-hashmap.txt
index b977ae8..4689968 100644
--- a/Documentation/technical/api-hashmap.txt
+++ b/Documentation/technical/api-hashmap.txt
@@ -58,6 +58,15 @@ Functions
 +
 `strihash` and `memihash` are case insensitive versions.
 
+`unsigned int sha1hash(const unsigned char *sha1)`::
+
+   Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
+   for use in hash tables. Cryptographic hashes are supposed to have
+   uniform distribution, so in contrast to `memhash()`, this just copies
+   the first `sizeof(int)` bytes without shuffling any bits. Note that
+   the results will be different on big-endian and little-endian
+   platforms, so they should not be stored or transferred over the net!
+
 `void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t 
initial_size)`::
 
Initializes a hashmap structure.
diff --git a/builtin/describe.c b/builtin/describe.c
index 24d740c..57e84c8 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -56,17 +56,10 @@ static int commit_name_cmp(const struct commit_name *cn1,
return hashcmp(cn1->peeled, peeled ? peeled : cn2->peeled);
 }
 
-static inline unsigned int hash_sha1(const unsigned char *sha1)
-{
-   unsigned int hash;
-   memcpy(&hash, sha1, sizeof(hash));
-   return hash;
-}
-
 static inline struct commit_name *find_commit_name(const unsigned char *peeled)
 {
struct commit_name key;
-   hashmap_entry_init(&key, hash_sha1(peeled));
+   hashmap_entry_init(&key, sha1hash(peeled));
return hashmap_get(&names, &key, peeled);
 }
 
@@ -114,7 +107,7 @@ static void add_to_known_names(const char *path,
if (!e) {
e = xmalloc(sizeof(struct commit_name));
hashcpy(e->peeled, peeled);
-   hashmap_entry_init(e, hash_sha1(peeled));
+   hashmap_entry_init(e, sha1hash(peeled));
hashmap_add(&names, e);
e->path = NULL;
}
diff --git a/decorate.c b/decorate.c
index 7cb5d29..b2aac90 100644
--- a/decorate.c
+++ b/decorate.c
@@ -8,10 +8,7 @@
 
 static unsigned int hash_obj(const struct object *obj, unsigned int n)
 {
-   unsigned int hash;
-
-   memcpy(&hash, obj->sha1, sizeof(unsigned int));
-   return hash % n;
+   return sha1hash(obj->sha1) % n;
 }
 
 static void *insert_decoration(struct decoration *n, const struct object 
*base, void *decoration)
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 749a35d..6fa97d4 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -242,14 +242,12 @@ struct file_similarity {
 
 static unsigned int hash_filespec(struct diff_filespec *filespec)
 {
-   unsigned int hash;
if (!filespec->sha1_valid) {
if (diff_populate_filespec(filespec, 0))
return 0;
hash_sha1_file(filespec->data, filespec->size, "blob", 
filespec->sha1);
}
-   memcpy(&hash, filespec->sha1, sizeof(hash));
-   return hash;
+   return sha1hash(filespec->sha1);
 }
 
 static int find_identical_files(struct hashmap *srcs,
diff --git a/hashmap.h b/hashmap.h
index a816ad4..ed5425a 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -13,6 +13,17 @@ extern unsigned int strihash(const char *buf);
 extern unsigned int memhash(const void *buf, size_t len);
 extern unsigned int memihash(const void *buf, size_t len);
 
+static inline unsigned int sha1hash(const unsigned char *sha1)
+{
+   /*
+* Equivalent to 'return *(int *)sha1;', but safe on platforms that
+* don't support unaligned reads.
+*/
+   unsigned int hash;
+   memcpy(&hash, sha1, sizeof(hash));
+   return hash;
+}
+
 /* data structures */
 
 struct hashmap_entry {
diff --git a/khash.h b/khash.h
index 57ff603..06c7906 100644
--- a/khash.h
+++ b/khash.h
@@ -320,19 +320,12 @@ static const double __ac_HASH_UPPER = 0.77;
code;   

[PATCH v1 0/4] hashmap improvements

2014-07-02 Thread Karsten Blees
Here are a few small hashmap improvements, partly resulting from recent
discussion of the config-cache topic.

Karsten Blees (4):
  hashmap: factor out getting an int hash code from a SHA1
  hashmap: improve struct hashmap member documentation
  hashmap: add simplified hashmap_get_from_hash() API
  hashmap: add string interning API

 Documentation/technical/api-hashmap.txt | 54 ++---
 builtin/describe.c  | 13 ++--
 decorate.c  |  5 +--
 diffcore-rename.c   | 11 +++
 hashmap.c   | 38 +++
 hashmap.h   | 27 +
 khash.h | 11 ++-
 name-hash.c |  5 ++-
 object.c| 13 +---
 pack-objects.c  |  5 ++-
 t/t0011-hashmap.sh  | 13 
 test-hashmap.c  | 25 ++-
 12 files changed, 159 insertions(+), 61 deletions(-)

-- 
1.9.4.msysgit.0.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 11/16] trace: add 'file:line' to all trace output

2014-07-02 Thread Karsten Blees
Am 02.07.2014 20:57, schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> +#else
>> +
>> +/*
>> + * Macros to add file:line - see above for C-style declarations of how these
>> + * should be used.
>> + *
>> + * TRACE_CONTEXT may be set to __FUNCTION__ if the compiler supports it. The
>> + * default is __FILE__, as it is consistent with assert(), and static 
>> function
>> + * names are not necessarily unique.
>> + */
>> +#define TRACE_CONTEXT __FILE__
> 
> Hmph, seeing "may be set to" forces me to wonder how.  Perhaps #ifndef/#endif
> around it?
> 

Right, shame on me. I didn't think it would be important enough to warrant a
Makefile option, but #ifndef sure wouldn't hurt.

> Also, can it be set to something like __FILE__ ":" __FUNCTION__
> which may alleviate the alleged problem of "not necessarily unique"
> perhaps?
> 

Should work with MSVC. With GCC, however, __FUNCTION__ is a string constant
supplied by the compiler, so string literal concatenation doesn't work.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 16/16] progress: simplify performance measurement by using getnanotime()

2014-07-01 Thread Karsten Blees
Calculating duration from a single uint64_t is simpler than from a struct
timeval. Change throughput measurement from gettimeofday() to
getnanotime().

Also calculate misec only if needed, and change integer division to integer
multiplication + shift, which should be slightly faster.

Signed-off-by: Karsten Blees 
---
 progress.c | 71 +++---
 1 file changed, 36 insertions(+), 35 deletions(-)

diff --git a/progress.c b/progress.c
index 261314e..412e6b1 100644
--- a/progress.c
+++ b/progress.c
@@ -12,13 +12,14 @@
 #include "gettext.h"
 #include "progress.h"
 #include "strbuf.h"
+#include "trace.h"
 
 #define TP_IDX_MAX  8
 
 struct throughput {
off_t curr_total;
off_t prev_total;
-   struct timeval prev_tv;
+   uint64_t prev_ns;
unsigned int avg_bytes;
unsigned int avg_misecs;
unsigned int last_bytes[TP_IDX_MAX];
@@ -127,65 +128,65 @@ static void throughput_string(struct strbuf *buf, off_t 
total,
 void display_throughput(struct progress *progress, off_t total)
 {
struct throughput *tp;
-   struct timeval tv;
-   unsigned int misecs;
+   uint64_t now_ns;
+   unsigned int misecs, count, rate;
+   struct strbuf buf = STRBUF_INIT;
 
if (!progress)
return;
tp = progress->throughput;
 
-   gettimeofday(&tv, NULL);
+   now_ns = getnanotime();
 
if (!tp) {
progress->throughput = tp = calloc(1, sizeof(*tp));
if (tp) {
tp->prev_total = tp->curr_total = total;
-   tp->prev_tv = tv;
+   tp->prev_ns = now_ns;
}
return;
}
tp->curr_total = total;
 
+   /* only update throughput every 0.5 s */
+   if (now_ns - tp->prev_ns <= 5)
+   return;
+
/*
-* We have x = bytes and y = microsecs.  We want z = KiB/s:
+* We have x = bytes and y = nanosecs.  We want z = KiB/s:
 *
-*  z = (x / 1024) / (y / 100)
-*  z = x / y * 100 / 1024
-*  z = x / (y * 1024 / 100)
+*  z = (x / 1024) / (y / 10)
+*  z = x / y * 10 / 1024
+*  z = x / (y * 1024 / 10)
 *  z = x / y'
 *
 * To simplify things we'll keep track of misecs, or 1024th of a sec
 * obtained with:
 *
-*  y' = y * 1024 / 100
-*  y' = y / (100 / 1024)
-*  y' = y / 977
+*  y' = y * 1024 / 10
+*  y' = y * (2^10 / 2^42) * (2^42 / 10)
+*  y' = y / 2^32 * 4398
+*  y' = (y * 4398) >> 32
 */
-   misecs = (tv.tv_sec - tp->prev_tv.tv_sec) * 1024;
-   misecs += (int)(tv.tv_usec - tp->prev_tv.tv_usec) / 977;
+   misecs = ((now_ns - tp->prev_ns) * 4398) >> 32;
 
-   if (misecs > 512) {
-   struct strbuf buf = STRBUF_INIT;
-   unsigned int count, rate;
+   count = total - tp->prev_total;
+   tp->prev_total = total;
+   tp->prev_ns = now_ns;
+   tp->avg_bytes += count;
+   tp->avg_misecs += misecs;
+   rate = tp->avg_bytes / tp->avg_misecs;
+   tp->avg_bytes -= tp->last_bytes[tp->idx];
+   tp->avg_misecs -= tp->last_misecs[tp->idx];
+   tp->last_bytes[tp->idx] = count;
+   tp->last_misecs[tp->idx] = misecs;
+   tp->idx = (tp->idx + 1) % TP_IDX_MAX;
 
-   count = total - tp->prev_total;
-   tp->prev_total = total;
-   tp->prev_tv = tv;
-   tp->avg_bytes += count;
-   tp->avg_misecs += misecs;
-   rate = tp->avg_bytes / tp->avg_misecs;
-   tp->avg_bytes -= tp->last_bytes[tp->idx];
-   tp->avg_misecs -= tp->last_misecs[tp->idx];
-   tp->last_bytes[tp->idx] = count;
-   tp->last_misecs[tp->idx] = misecs;
-   tp->idx = (tp->idx + 1) % TP_IDX_MAX;
-
-   throughput_string(&buf, total, rate);
-   strncpy(tp->display, buf.buf, sizeof(tp->display));
-   strbuf_release(&buf);
-   if (progress->last_value != -1 && progress_update)
-   display(progress, progress->last_value, NULL);
-   }
+   throughput_string(&buf, total, rate);
+   strncpy(tp->display, buf.buf, sizeof(tp->display));
+   strbuf_release(&buf);
+   if (progress->last_value != -1 && progress_update)
+   display(progress, progress->last_value, NULL);
 }
 
 int display_progress(struct progress *progress, unsigned n)
-- 
2.0.0.406.ge74f8ff

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 15/16] wt-status: simplify performance measurement by using getnanotime()

2014-07-01 Thread Karsten Blees
Calculating duration from a single uint64_t is simpler than from a struct
timeval. Change performance measurement for 'advice.statusuoption' from
gettimeofday() to getnanotime().

Also initialize t_begin to prevent uninitialized variable warning.

Signed-off-by: Karsten Blees 
---
 wt-status.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 318a191..dfdc018 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -574,14 +574,11 @@ static void wt_status_collect_untracked(struct wt_status 
*s)
 {
int i;
struct dir_struct dir;
-   struct timeval t_begin;
+   uint64_t t_begin = getnanotime();
 
if (!s->show_untracked_files)
return;
 
-   if (advice_status_u_option)
-   gettimeofday(&t_begin, NULL);
-
memset(&dir, 0, sizeof(dir));
if (s->show_untracked_files != SHOW_ALL_UNTRACKED_FILES)
dir.flags |=
@@ -612,13 +609,8 @@ static void wt_status_collect_untracked(struct wt_status 
*s)
free(dir.ignored);
clear_directory(&dir);
 
-   if (advice_status_u_option) {
-   struct timeval t_end;
-   gettimeofday(&t_end, NULL);
-   s->untracked_in_ms =
-   (uint64_t)t_end.tv_sec * 1000 + t_end.tv_usec / 1000 -
-   ((uint64_t)t_begin.tv_sec * 1000 + t_begin.tv_usec / 
1000);
-   }
+   if (advice_status_u_option)
+   s->untracked_in_ms = (getnanotime() - t_begin) / 100;
 }
 
 void wt_status_collect(struct wt_status *s)
-- 
2.0.0.406.ge74f8ff

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   >