[PATCH 1/1] doc: Mention info/attributes in gitrepository-layout

2017-11-22 Thread Steffen Prohaska
Signed-off-by: Steffen Prohaska 
---
 Documentation/gitrepository-layout.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/gitrepository-layout.txt 
b/Documentation/gitrepository-layout.txt
index adf9554ad2..c60bcad44a 100644
--- a/Documentation/gitrepository-layout.txt
+++ b/Documentation/gitrepository-layout.txt
@@ -208,6 +208,10 @@ info/exclude::
'git clean' look at it but the core Git commands do not look
at it.  See also: linkgit:gitignore[5].
 
+info/attributes::
+   Defines which attributes to assign to a path, similar to per-directory
+   `.gitattributes` files.   See also: linkgit:gitattributes[5].
+
 info/sparse-checkout::
This file stores sparse checkout patterns.
See also: linkgit:git-read-tree[1].
-- 
2.15.0.5.g43a9009988



git-sha-x: idea for stronger cryptographic verification while keeping SHA1 content addresses

2017-02-26 Thread Steffen Prohaska
Hi,

Related to shattered, the recent discussion,
<https://public-inbox.org/git/20170223164306.spg2avxzukkgg...@kitenet.net>, the
past
<https://public-inbox.org/git/pine.lnx.4.58.0504291221250.18...@ppc970.osdl.org/>,
and Linus's post <https://plus.google.com/+LinusTorvalds/posts/7tp2gYWQugL>,
the idea below might be interesting.

I skimmed through the discussion but haven't read all the details.  I also
haven't been following the Git list during the last years, so it might very
well be that others have described similar ideas and the general approach has
been reject for some reason that I'm not aware of.

git-sha-x illustrates a potential solution for stronger cryptographic content
verification in Git while keeping SHA1 content pointers.

git-sha-x is available at <https://github.com/sprohaska/git-sha-x>.

git-sha-x computes a hash tree similar to the SHA1-based Git history but using
a different hash function.  The hashes can be added to the commit message and
signed with GPG to confirm the tree and the entire history with a cryptographic
strength that no longer depends on SHA1 but on the chosen git-sha-x hash and
the GPG configuration.  See `git-sha-x --help`.  Examples:

```
git-sha-x commit HEAD
git-sha-x tree HEAD
git-sha-x --sha512 commit HEAD

git-sha-x amend && git-sha-x --sha512 amend && git commit -S --amend -C HEAD
```

git-sha-x is only a proof of concept to illustrate the idea.  I do not intend
to develop it further.

If a similar approach was chosen for Git, the hashes could be managed by Git
and somehow represented in its object store as supplementary information.  Git
could incrementally compute additional hashes while it constructs history and
verify them when transferring data.

The strength of bare SHA1 ids is obviously not increased.  The strength is only
increased if the additional hashes are communicated in a verifiable way, too.
GPG signatures are one way.  Another way could be to communicate them via
a secure channel and pass them to git fetch for verification.  Assuming such an
implementation, a fetch for a commit from this repo could look like:

```bash
git fetch origin \
--sha256=8a3c72de658a4797e36bb29fc3bdc2a2863c04455a1b394ed9331f11f65ba802 \

--sha512=729de81500ce4ad70716d7641a115bd0a67984acc4d674044b25850e36d940bf631f9f6aa88768743690545ac899888fb54f65840f84853f9a8811aeb9ca
 \
ef2a4b7d216ab79630b9cd17e072a86e57f044fa
```

For practical purposes, supplementary hashes in the commit in combination with
GPG signatures should provide sufficient protection against attackers that try
to manipulate SHA1s.  For convenience, supplementary hashes could be stored in
the commit header, similar to `gpgsig`.  A hypothetical commit object could
look like:

```
tree 365c7e42fd004a1778c6d79c0437f970397a59b8
parent c2bfff12099b32425a3bcc4d0c7e6e6a169392d8
tree-sha256 2f588b9308b5203212d646fb56201608449cb4d83a5ffd6b7e6213d175a8077c
parent-sha256 090d9a3e69aa3369efac968abde859a6e42d05b631ece6d533765a35e998336c
tree-sha512 
12ae91b23733d52fa2f42b8f0bb5aeaeb111335688f387614c3b108a8cb86fa0e2cd6d19bf050f8a9308f8c1e991080507c91df53e0fc4cace3f746ec89a789a
parent-sha512 
d319889a40cf945d8c61dbe6d816e10badd49845c547df85ace4327676275eeb5ba2cd962712ddbb8f08f2db17dbc9eb46b59b5f7b7a7e05eab7df0ef89dec65
author Steffen Prohaska  1488122961 +0100
committer Steffen Prohaska  1488123452 +0100
gpgsig ...
```

GPG signatures would automatically cover the supplementary hashes.
Verification code paths would have to be added to compute the hashes from the
content to confirm that it has not been tampered with.

Since content verification would become independent from the content address,
the interpretation of the content address could be changed in the future.  The
size of 160 bits could be kept for simplicity.  But the meaning could be
changed.  For example, the first 160 bits of SHA256 could be uses as the
content address.  The remaining bits could be stored in an object supplement.
Verification code paths would combine the content address with the additional
bits to verify the SHA256.  Content pointers would keep their size.  Only the
additional SHA256 bits would be stored and used for verification.

Steffen



signature.asc
Description: Message signed with OpenPGP


[PATCH] subtree: fix AsciiDoc list item continuation

2015-01-04 Thread Steffen Prohaska
List items must be continued with '+' (see [asciidoc]).

[asciidoc] AsciiDoc user guide 17.7. List Item Continuation
<http://www.methods.co.nz/asciidoc/userguide.html#X15>

Signed-off-by: Steffen Prohaska 
---
 contrib/subtree/git-subtree.txt | 194 ++--
 1 file changed, 89 insertions(+), 105 deletions(-)

diff --git a/contrib/subtree/git-subtree.txt b/contrib/subtree/git-subtree.txt
index 8272100..54e4b4a 100644
--- a/contrib/subtree/git-subtree.txt
+++ b/contrib/subtree/git-subtree.txt
@@ -81,12 +81,11 @@ merge::
changes into the latest .  With '--squash',
creates only one commit that contains all the changes,
rather than merging in the entire history.
-
-   If you use '--squash', the merge direction doesn't
-   always have to be forward; you can use this command to
-   go back in time from v2.5 to v2.4, for example.  If your
-   merge introduces a conflict, you can resolve it in the
-   usual ways.
++
+If you use '--squash', the merge direction doesn't always have to be
+forward; you can use this command to go back in time from v2.5 to v2.4,
+for example.  If your merge introduces a conflict, you can resolve it in
+the usual ways.

 pull::
Exactly like 'merge', but parallels 'git pull' in that
@@ -107,21 +106,19 @@ split::
contents of  at the root of the project instead
of in a subdirectory.  Thus, the newly created history
is suitable for export as a separate git repository.
-   
-   After splitting successfully, a single commit id is
-   printed to stdout.  This corresponds to the HEAD of the
-   newly created tree, which you can manipulate however you
-   want.
-   
-   Repeated splits of exactly the same history are
-   guaranteed to be identical (i.e. to produce the same
-   commit ids).  Because of this, if you add new commits
-   and then re-split, the new commits will be attached as
-   commits on top of the history you generated last time,
-   so 'git merge' and friends will work as expected.
-   
-   Note that if you use '--squash' when you merge, you
-   should usually not just '--rejoin' when you split.
++
+After splitting successfully, a single commit id is printed to stdout.
+This corresponds to the HEAD of the newly created tree, which you can
+manipulate however you want.
++
+Repeated splits of exactly the same history are guaranteed to be
+identical (i.e. to produce the same commit ids).  Because of this, if
+you add new commits and then re-split, the new commits will be attached
+as commits on top of the history you generated last time, so 'git merge'
+and friends will work as expected.
++
+Note that if you use '--squash' when you merge, you should usually not
+just '--rejoin' when you split.
 
 
 OPTIONS
@@ -151,109 +148,96 @@ OPTIONS FOR add, merge, push, pull
 --squash::
This option is only valid for add, merge, push and pull
commands.
-
-   Instead of merging the entire history from the subtree
-   project, produce only a single commit that contains all
-   the differences you want to merge, and then merge that
-   new commit into your project.
-   
-   Using this option helps to reduce log clutter. People
-   rarely want to see every change that happened between
-   v1.0 and v1.1 of the library they're using, since none of the
-   interim versions were ever included in their application.
-   
-   Using '--squash' also helps avoid problems when the same
-   subproject is included multiple times in the same
-   project, or is removed and then re-added.  In such a
-   case, it doesn't make sense to combine the histories
-   anyway, since it's unclear which part of the history
-   belongs to which subtree.
-   
-   Furthermore, with '--squash', you can switch back and
-   forth between different versions of a subtree, rather
-   than strictly forward.  'git subtree merge --squash'
-   always adjusts the subtree to match the exactly
-   specified commit, even if getting to that commit would
-   require undoing some changes that were added earlier.
-   
-   Whether or not you use '--squash', changes made in your
-   local repository remain intact and can be later split
-   and send upstream to the subproject.
++
+Instead of merging the entire history from the subtree project, produce
+only a single commit that contains all the differences you want to
+merge, and then merge that new commit into your project.
++
+Using this option helps to reduce log clutter. People rarely want to see
+every change that happened between v1.0 and v1.1 of the library they're
+using, since none of the interim ve

[PATCH] sha1_file: don't convert off_t to size_t too early to avoid potential die()

2014-09-21 Thread Steffen Prohaska
xsize_t() checks if an off_t argument can be safely converted to
a size_t return value.  If the check is executed too early, it could
fail for large files on 32-bit architectures even if the size_t code
path is not taken.  Other paths might be able to handle the large file.
Specifically, index_stream_convert_blob() is able to handle a large file
if a filter is configured that returns a small result.

Signed-off-by: Steffen Prohaska 
---

This patch should be applied on top of sp/stream-clean-filter.

index_stream() might internally also be able to handle large files to
some extent.  But it uses size_t for its third argument, and we must
already die() when calling it.  It might be a good idea to convert its
interface to use off_t and push the size checks further down the stack.
In general, it might be good idea to carefully consider whether to use
off_t or size_t when passing file-related sizes around.  To me it looks
like a separate issue for a separate patch series (I have no specific
plans to prepare one).

 sha1_file.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5b0e67a..6f18c22 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3180,17 +3180,22 @@ int index_fd(unsigned char *sha1, int fd, struct stat 
*st,
 enum object_type type, const char *path, unsigned flags)
 {
int ret;
-   size_t size = xsize_t(st->st_size);
 
+   /*
+* Call xsize_t() only when needed to avoid potentially unnecessary
+* die() for large files.
+*/
if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(path))
ret = index_stream_convert_blob(sha1, fd, path, flags);
else if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
-   else if (size <= big_file_threshold || type != OBJ_BLOB ||
+   else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
 (path && would_convert_to_git(path)))
-   ret = index_core(sha1, fd, size, type, path, flags);
+   ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
+flags);
else
-   ret = index_stream(sha1, fd, size, type, path, flags);
+   ret = index_stream(sha1, fd, xsize_t(st->st_size), type, path,
+  flags);
close(fd);
return ret;
 }
-- 
2.1.0.139.g351b19f

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to update index stat info without reading file content?

2014-09-11 Thread Steffen Prohaska
Hi,

Is there a way to update the stat information recorded in the index without 
reading the file content from disk?

Starting from a clean working copy with a committed `file`, I'd like 

touch file
git  file

to bring the index into essentially the same state as

touch file
git status

but without reading the content of `file`.  (I'd be willing to wait a bit 
between touch and the magic command, to avoid any racy-git-related code paths.)

`git-update-index --assume-unchanged` is related.  But it requires completely 
manual handling of changes, because git will not look at marked files until 
told otherwise with `--no-assume-unchanged`.  I'd like to tell git only that 
the current file content is known to be up-to-date.  Any future changes should 
be handled as usual.

In the documentation, `git add --refresh file` sounds a bit like what I'm 
looking for.  The documentation of `--refresh` states: "Don’t add the file(s), 
but only refresh their stat() information in the index."  But it doesn't do 
what I want.

I looked a bit into `read-cache.c`.  The comment above `refresh_cache_ent()` 
sounds promising:

 "refresh" does not calculate a new sha1 file or bring the
 cache up-to-date for mode/content changes. But what it
 _does_ do is to "re-match" the stat information of a file
 with the cache, so that you can refresh the cache for a
 file that hasn't been changed but where the stat entry is
 out of date.

But it isn't obvious to me whether what I'm looking for is available.  All code 
paths that eventually reach `fill_stat_cache_info()` seem to go through 
`ce_compare_data()` and therefore `index_fd()`, which reads the data from disk.

Steffen--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Documentation: use single-parameter --cacheinfo in example

2014-09-11 Thread Steffen Prohaska
The single-parameter form is described as the preferred way.  Separate
arguments are only supported for backward compatibility.  Update the
example to the recommended form.

Signed-off-by: Steffen Prohaska 
---
 Documentation/git-update-index.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-update-index.txt 
b/Documentation/git-update-index.txt
index dfc09d9..82eca6f 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -202,7 +202,7 @@ merging.
 To pretend you have a file with mode and sha1 at path, say:
 
 
-$ git update-index --cacheinfo mode sha1 path
+$ git update-index --cacheinfo ,,
 
 
 '--info-only' is used to register files without placing them in the object
-- 
2.1.0.137.gbf198b9

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 5/6] Change copy_fd() to not close input fd

2014-08-28 Thread Steffen Prohaska

On Aug 26, 2014, at 8:29 PM, Jeff King  wrote:

> On Tue, Aug 26, 2014 at 05:23:24PM +0200, Steffen Prohaska wrote:
> 
>> The caller opened the fd, so it should be responsible for closing it.
>> 
>> Signed-off-by: Steffen Prohaska 
>> ---
>> copy.c | 5 +
>> lockfile.c | 3 +++
>> 2 files changed, 4 insertions(+), 4 deletions(-)
>> 
>> diff --git a/copy.c b/copy.c
>> index a7f58fd..d0a1d82 100644
>> --- a/copy.c
>> +++ b/copy.c
>> @@ -10,7 +10,6 @@ int copy_fd(int ifd, int ofd)
>>  break;
>>  if (len < 0) {
>>  int read_error = errno;
>> -close(ifd);
>>  return error("copy-fd: read returned %s",
>>   strerror(read_error));
>>  }
> 
> This saved errno is not necessary anymore (the problem was that close()
> clobbered the error in the original code). It can go away, and we can
> even drop the curly braces.
> 
>> @@ -21,17 +20,14 @@ int copy_fd(int ifd, int ofd)
>>  len -= written;
>>  }
>>  else if (!written) {
>> -close(ifd);
>>  return error("copy-fd: write returned 0");
>>  } else {
>>  int write_error = errno;
>> -close(ifd);
>>  return error("copy-fd: write returned %s",
>>   strerror(write_error));
>>  }
>>  }
> 
> Ditto here. Actually, isn't this whole write just a reimplementation of
> write_in_full? The latter treats a return of 0 as ENOSPC rather than
> using a custom message, but I think that is sane.
> 
> All together:

Makes all sense, and seems sane to me, too.

Junio, I saw that you have the changes on pu with 'SQUASH???...'.  Will you
squash it, or shall I send another complete update of the patch series?

Steffen



> ---
> copy.c | 28 +---
> 1 file changed, 5 insertions(+), 23 deletions(-)
> 
> diff --git a/copy.c b/copy.c
> index a7f58fd..53a9ece 100644
> --- a/copy.c
> +++ b/copy.c
> @@ -4,34 +4,16 @@ int copy_fd(int ifd, int ofd)
> {
>   while (1) {
>   char buffer[8192];
> - char *buf = buffer;
>   ssize_t len = xread(ifd, buffer, sizeof(buffer));
>   if (!len)
>   break;
> - if (len < 0) {
> - int read_error = errno;
> - close(ifd);
> + if (len < 0)
>   return error("copy-fd: read returned %s",
> -  strerror(read_error));
> - }
> - while (len) {
> - int written = xwrite(ofd, buf, len);
> - if (written > 0) {
> - buf += written;
> - len -= written;
> - }
> - else if (!written) {
> - close(ifd);
> - return error("copy-fd: write returned 0");
> - } else {
> - int write_error = errno;
> - close(ifd);
> - return error("copy-fd: write returned %s",
> -  strerror(write_error));
> - }
> - }
> +  strerror(errno));
> + if (write_in_full(ofd, buffer, len) < 0)
> + return error("copy-fd: write returned %s",
> +  strerror(errno));
>   }
> - close(ifd);
>   return 0;
> }
> 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/6] Add git_env_ulong() to parse environment variable

2014-08-28 Thread Steffen Prohaska

On Aug 27, 2014, at 4:47 PM, Junio C Hamano  wrote:

> Jeff King  writes:
> 
>> On Tue, Aug 26, 2014 at 02:54:11PM -0700, Junio C Hamano wrote:
>> 
>>> A worse position is to have git_env_bool() that says "empty is
>>> false" and add a new git_env_ulong() that says "empty is unset".
>>> 
>>> We should pick one or the other and use it for both.
>> 
>> Yeah, I agree they should probably behave the same.
>> 
 The middle ground would be to die(). That does not seem super-friendly, but
 then we would also die with GIT_SMART_HTTP=foobar, so perhaps it is not
 unreasonable to just consider it a syntax error.
>>> 
>>> Hmm, I am not sure if dying is better.  Unless we decide to make
>>> empty string no longer false everywhere and warn now and then later
>>> die as part of a 3.0 transition plan or something, that is.
>> 
>> I think it is better in the sense that while it may be unexpected, it
>> does not unexpectedly do something that the user cannot easily undo.
>> 
>> I really do not think this topic is worth the effort of a long-term
>> deprecation scheme (which I agree _is_ required for a change to the
>> config behavior). Let's just leave it as-is. We've seen zero real-world
>> complaints, only my own surprise after reading the code (and Steffen's
>> patch should be tweaked to match).
> 
> OK, then let's do that at least for now and move on.

Ok.  I saw that you tweaked my patch on pu.  Maybe remove the outdated
comment above the function completely:

diff --git a/config.c b/config.c
index 87db755..010bcd0 100644
--- a/config.c
+++ b/config.c
@@ -1122,9 +1122,6 @@ int git_env_bool(const char *k, int def)
return v ? git_config_bool(k, v) : def;
 }

-/*
- * Use default if environment variable is unset or empty string.
- */
 unsigned long git_env_ulong(const char *k, unsigned long val)
 {
const char *v = getenv(k);

Steffen--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/6] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_env_ulong()

2014-08-26 Thread Steffen Prohaska
Changes since v5:

 - Introduce and use git_env_ulong().
 - Change copy_fd() to not close input fd, which simplified changes to convert.
 - More detailed explanation why filter must be marked "required" in commit
   message of 6/6.
 - Style fixes.

Steffen Prohaska (6):
  convert: drop arguments other than 'path' from would_convert_to_git()
  Add git_env_ulong() to parse environment variable
  Change GIT_ALLOC_LIMIT check to use git_env_ulong()
  Introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  Change copy_fd() to not close input fd
  convert: stream from fd to required clean filter to reduce used
address space

 cache.h   |  1 +
 config.c  | 11 +++
 convert.c | 55 +--
 convert.h | 10 +++---
 copy.c|  5 +
 lockfile.c|  3 +++
 sha1_file.c   | 47 ---
 t/t0021-conversion.sh | 24 +-
 t/t1050-large.sh  |  2 +-
 wrapper.c | 15 +++---
 10 files changed, 144 insertions(+), 29 deletions(-)

-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 3/6] Change GIT_ALLOC_LIMIT check to use git_env_ulong()

2014-08-26 Thread Steffen Prohaska
GIT_ALLOC_LIMIT limits xmalloc()'s size, which is of type size_t.
Better use git_env_ulong() to parse the environment variable, so that
the postfixes 'k', 'm', and 'g' can be used; and use size_t to store the
limit for consistency.  The change to size_t has no direct practical
impact, because we use GIT_ALLOC_LIMIT to test small sizes.

The cast of size in the call to die() is changed to uintmax_t to match
the format string PRIuMAX.

Signed-off-by: Steffen Prohaska 
---
 t/t1050-large.sh |  2 +-
 wrapper.c| 15 ---
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index aea4936..e7657ab 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -13,7 +13,7 @@ test_expect_success setup '
echo X | dd of=large2 bs=1k seek=2000 &&
echo X | dd of=large3 bs=1k seek=2000 &&
echo Y | dd of=huge bs=1k seek=2500 &&
-   GIT_ALLOC_LIMIT=1500 &&
+   GIT_ALLOC_LIMIT=1500k &&
export GIT_ALLOC_LIMIT
 '
 
diff --git a/wrapper.c b/wrapper.c
index bc1bfb8..c5204f7 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -11,14 +11,15 @@ static void (*try_to_free_routine)(size_t size) = 
do_nothing;
 
 static void memory_limit_check(size_t size)
 {
-   static int limit = -1;
-   if (limit == -1) {
-   const char *env = getenv("GIT_ALLOC_LIMIT");
-   limit = env ? atoi(env) * 1024 : 0;
+   static size_t limit = 0;
+   if (!limit) {
+   limit = git_env_ulong("GIT_ALLOC_LIMIT", 0);
+   if (!limit)
+   limit = SIZE_MAX;
}
-   if (limit && size > limit)
-   die("attempting to allocate %"PRIuMAX" over limit %d",
-   (intmax_t)size, limit);
+   if (size > limit)
+   die("attempting to allocate %"PRIuMAX" over limit %"PRIuMAX,
+   (uintmax_t)size, (uintmax_t)limit);
 }
 
 try_to_free_t set_try_to_free_routine(try_to_free_t routine)
-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 6/6] convert: stream from fd to required clean filter to reduce used address space

2014-08-26 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  If it fails, the original data is
not immediately available.  The condition can easily be handled as
a fatal error, which is expected for a required filter anyway.

If the filter was not required, the condition would need to be handled
in a different way, like seeking to 0 and reading the data.  But this
would require more restructuring of the code and is probably not worth
it.  The obvious approach of falling back to reading all data would not
help achieving the main purpose of this patch, which is to handle large
files with limited address space.  If reading all data is an option, we
can simply take the old code path right away and mmap the entire file.

The environment variable GIT_MMAP_LIMIT, which has been introduced in
a previous commit is used to test that the expected code path is taken.
A related test that exercises required filters is modified to verify
that the data actually has been modified on its way from the file system
to the object store.

Signed-off-by: Steffen Prohaska 
---
 convert.c | 55 +--
 convert.h |  5 +
 sha1_file.c   | 27 -
 t/t0021-conversion.sh | 24 +-
 4 files changed, 99 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..677d339 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -355,7 +356,12 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params->src, params->size) 
< 0);
+   if (params->src) {
+   write_err = (write_in_full(child_process.in, params->src, 
params->size) < 0);
+   } else {
+   write_err = copy_fd(params->fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +377,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +398,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(&async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = ¶ms;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +754,25 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path)
+{
+   struct conv_attrs ca;
+
+   convert_attrs(&ca, path);
+   if (!ca.drv)
+   return 0;
+
+   /*
+* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available.
+*/
+   if (!ca.drv->required)
+   return 0;
+
+   return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +787,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv->required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret && required)
die("%s: clean filter '%s' failed", path, ca.drv->name);
 
@@ -778,6 +804,23 @@ int convert_to_git(const char *path, const char *src, 
siz

[PATCH v6 4/6] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-26 Thread Steffen Prohaska
Similar to testing expectations about malloc with GIT_ALLOC_LIMIT (see
commit d41489 and previous commit), it can be useful to test
expectations about mmap.

This introduces a new environment variable GIT_MMAP_LIMIT to limit the
largest allowed mmap length.  xmmap() is modified to check the limit.
Together with GIT_ALLOC_LIMIT tests can now easily confirm expectations
about memory consumption.

GIT_MMAP_LIMIT will be used in the next commit to test that data will be
streamed to an external filter without mmaping the entire file.

[commit d41489]: d41489a6424308dc9a0409bc2f6845aa08bd4f7d Add more large
blob test cases

Signed-off-by: Steffen Prohaska 
---
 sha1_file.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 00c07f2..d9b5157 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -663,10 +663,26 @@ void release_pack_memory(size_t need)
; /* nothing */
 }
 
+static void mmap_limit_check(size_t length)
+{
+   static size_t limit = 0;
+   if (!limit) {
+   limit = git_env_ulong("GIT_MMAP_LIMIT", 0);
+   if (!limit)
+   limit = SIZE_MAX;
+   }
+   if (length > limit)
+   die("attempting to mmap %"PRIuMAX" over limit %"PRIuMAX,
+   (uintmax_t)length, (uintmax_t)limit);
+}
+
 void *xmmap(void *start, size_t length,
int prot, int flags, int fd, off_t offset)
 {
-   void *ret = mmap(start, length, prot, flags, fd, offset);
+   void *ret;
+
+   mmap_limit_check(length);
+   ret = mmap(start, length, prot, flags, fd, offset);
if (ret == MAP_FAILED) {
if (!length)
return NULL;
-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 2/6] Add git_env_ulong() to parse environment variable

2014-08-26 Thread Steffen Prohaska
The new function will be used to parse GIT_MMAP_LIMIT and
GIT_ALLOC_LIMIT.

Signed-off-by: Steffen Prohaska 
---
 cache.h  |  1 +
 config.c | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/cache.h b/cache.h
index fcb511d..b820b6a 100644
--- a/cache.h
+++ b/cache.h
@@ -1318,6 +1318,7 @@ extern int git_config_rename_section_in_file(const char 
*, const char *, const c
 extern const char *git_etc_gitconfig(void);
 extern int check_repository_format_version(const char *var, const char *value, 
void *cb);
 extern int git_env_bool(const char *, int);
+extern unsigned long git_env_ulong(const char *, unsigned long);
 extern int git_config_system(void);
 extern int config_error_nonbool(const char *);
 #if defined(__GNUC__)
diff --git a/config.c b/config.c
index 058505c..178721f 100644
--- a/config.c
+++ b/config.c
@@ -1122,6 +1122,17 @@ int git_env_bool(const char *k, int def)
return v ? git_config_bool(k, v) : def;
 }
 
+/*
+ * Use default if environment variable is unset or empty string.
+ */
+unsigned long git_env_ulong(const char *k, unsigned long val)
+{
+   const char *v = getenv(k);
+   if (v && *v && !git_parse_ulong(v, &val))
+   die("failed to parse %s", k);
+   return val;
+}
+
 int git_config_system(void)
 {
return !git_env_bool("GIT_CONFIG_NOSYSTEM", 0);
-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 5/6] Change copy_fd() to not close input fd

2014-08-26 Thread Steffen Prohaska
The caller opened the fd, so it should be responsible for closing it.

Signed-off-by: Steffen Prohaska 
---
 copy.c | 5 +
 lockfile.c | 3 +++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/copy.c b/copy.c
index a7f58fd..d0a1d82 100644
--- a/copy.c
+++ b/copy.c
@@ -10,7 +10,6 @@ int copy_fd(int ifd, int ofd)
break;
if (len < 0) {
int read_error = errno;
-   close(ifd);
return error("copy-fd: read returned %s",
 strerror(read_error));
}
@@ -21,17 +20,14 @@ int copy_fd(int ifd, int ofd)
len -= written;
}
else if (!written) {
-   close(ifd);
return error("copy-fd: write returned 0");
} else {
int write_error = errno;
-   close(ifd);
return error("copy-fd: write returned %s",
 strerror(write_error));
}
}
}
-   close(ifd);
return 0;
 }
 
@@ -60,6 +56,7 @@ int copy_file(const char *dst, const char *src, int mode)
return fdo;
}
status = copy_fd(fdi, fdo);
+   close(fdi);
if (close(fdo) != 0)
return error("%s: close error: %s", dst, strerror(errno));
 
diff --git a/lockfile.c b/lockfile.c
index 2564a7f..2448d30 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -224,8 +224,11 @@ int hold_lock_file_for_append(struct lock_file *lk, const 
char *path, int flags)
} else if (copy_fd(orig_fd, fd)) {
if (flags & LOCK_DIE_ON_ERROR)
exit(128);
+   close(orig_fd);
close(fd);
return -1;
+   } else {
+   close(orig_fd);
}
return fd;
 }
-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 1/6] convert: drop arguments other than 'path' from would_convert_to_git()

2014-08-26 Thread Steffen Prohaska
It is only the path that matters in the decision whether to filter or
not.  Clarify this by making path the single argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska 
---
 convert.h   | 5 ++---
 sha1_file.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/convert.h b/convert.h
index 0c2143c..c638b33 100644
--- a/convert.h
+++ b/convert.h
@@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const 
char *src,
   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
  struct strbuf *dst);
-static inline int would_convert_to_git(const char *path, const char *src,
-  size_t len, enum safe_crlf checksafe)
+static inline int would_convert_to_git(const char *path)
 {
-   return convert_to_git(path, src, len, NULL, checksafe);
+   return convert_to_git(path, NULL, 0, NULL, 0);
 }
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..00c07f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (size <= big_file_threshold || type != OBJ_BLOB ||
-(path && would_convert_to_git(path, NULL, 0, 0)))
+(path && would_convert_to_git(path)))
ret = index_core(sha1, fd, size, type, path, flags);
else
ret = index_stream(sha1, fd, size, type, path, flags);
-- 
2.1.0.8.gf3a29c8

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap

2014-08-25 Thread Steffen Prohaska

On Aug 25, 2014, at 2:43 PM, Jeff King  wrote:

> On Sun, Aug 24, 2014 at 06:07:46PM +0200, Steffen Prohaska wrote:
> 
>> The data is streamed to the filter process anyway.  Better avoid mapping
>> the file if possible.  This is especially useful if a clean filter
>> reduces the size, for example if it computes a sha1 for binary data,
>> like git media.  The file size that the previous implementation could
>> handle was limited by the available address space; large files for
>> example could not be handled with (32-bit) msysgit.  The new
>> implementation can filter files of any size as long as the filter output
>> is small enough.
>> 
>> The new code path is only taken if the filter is required.  The filter
>> consumes data directly from the fd.  The original data is not available
>> to git, so it must fail if the filter fails.
> 
> Can you clarify this second paragraph a bit more? If I understand
> correctly, we handle a non-required filter failing by just reading the
> data again (which we can do because we either read it into memory
> ourselves, or mmap it).

We don't read the data again.  convert_to_git() assumes that it is already
in memory and simply keeps the original buffer if the filter fails.


> With the streaming approach, we will read the
> whole file through our stream; if that fails we would then want to read
> the stream from the start.
> 
> Couldn't we do that with an lseek (or even an mmap with offset 0)? That
> obviously would not work for non-file inputs, but I think we address
> that already in index_fd: we push non-seekable things off to index_pipe,
> where we spool them to memory.

It could be handled that way, but we would be back to the original problem
that 32-bit git fails for large files.  The convert code path currently
assumes that all data is available in a single buffer at some point to apply
crlf and ident filters.

If the initial filter, which is assumed to reduce the file size, fails, we
could seek to 0 and read the entire file.  But git would then fail for large
files with out-of-memory.  We would not gain anything for the use case that
I describe in the commit message's first paragraph.

To implement something like the ideal strategy below, the entire convert 
machinery for crlf and ident would have to be converted to a streaming
approach.  Another option would be to detect that only the clean filter
would be applied and not crlf and ident.  Maybe we could get away with
something simpler then.

But I think that if the clean filter's purpose is to reduce file size, it
does not make sense to try to handle the case of a failing filter with a 
fallback plan.  The filter should simply be marked "required", because
any sane operation requires it.


> So it seems like the ideal strategy would be:
> 
>  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
> 
>  2. If it's not seekable and the filter is required, try streaming. We
> die anyway if we fail.
> 
>  3. If it's not seekable and the filter is not required, decide based
> on file size:
> 
>   a. If it's small, spool to memory and proceed as we do now.
> 
>   b. If it's big, spool to a seekable tempfile.
> 
> Your patch implements part 2. But I would think part 1 is the most common
> case. And while part 3b seems unpleasant, it is better than the current
> code (with or without your patch), which will do 3a on a large file.
> 
> Hmm. Though I guess in (3) we do not have the size up front, so it's
> complicated (we could spool N bytes to memory, then start dumping to a
> file after that). I do not think we necessarily need to implement that
> part, though. It seems like (1) is the thing I would expect to hit the
> most (i.e., people do not always mark their filters are "required").

Well, I think they have to mark it if the filter's purpose is to reduce size.

I'll add a bit of the discussion to the commit message.  I'm not convinced
that we should do more at this point.


>> +} else {
>> +/* dup(), because copy_fd() closes the input fd. */
>> +fd = dup(params->fd);
> 
> Not a problem you are introducing, but this seem kind of like a
> misfeature in copy_fd. Is it worth fixing? The function only has two
> existing callers.

I found it confusing.  I think it's worth fixing.

Steffen
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()

2014-08-25 Thread Steffen Prohaska

On Aug 25, 2014, at 1:38 PM, Jeff King  wrote:

> On Sun, Aug 24, 2014 at 06:07:44PM +0200, Steffen Prohaska wrote:
> 
>> diff --git a/wrapper.c b/wrapper.c
>> index bc1bfb8..69d1c9b 100644
>> --- a/wrapper.c
>> +++ b/wrapper.c
>> @@ -11,14 +11,18 @@ static void (*try_to_free_routine)(size_t size) = 
>> do_nothing;
>> 
>> static void memory_limit_check(size_t size)
>> {
>> -static int limit = -1;
>> -if (limit == -1) {
>> -const char *env = getenv("GIT_ALLOC_LIMIT");
>> -limit = env ? atoi(env) * 1024 : 0;
>> +static size_t limit = SIZE_MAX;
>> +if (limit == SIZE_MAX) {
> 
> You use SIZE_MAX as the sentinel for "not set", and 0 as the sentinel
> for "no limit". That seems kind of backwards.
> 
> I guess you are inheriting this from the existing code, which lets
> GIT_ALLOC_LIMIT=0 mean "no limit". I'm not sure if we want to keep that
> or not (it would be backwards incompatible to change it, but we are
> already breaking compatibility here by assuming bytes rather than
> kilobytes; I think that's OK because this is not a documented feature,
> or one intended to be used externally).

I think it's reasonable that GIT_ALLOC_LIMIT=0 means "no limit", so that
the limit can easily be disabled temporarily.

But I could change the sentinel and handle 0 like:

if (git_parse_ulong(env, &val)) {
if (!val) {
val = SIZE_MAX;
}
}

Maybe we should do this.



>> +const char *var = "GIT_ALLOC_LIMIT";
>> +unsigned long val = 0;
>> +const char *env = getenv(var);
>> +if (env && !git_parse_ulong(env, &val))
>> +die("Failed to parse %s", var);
>> +limit = val;
>>  }
> 
> This and the next patch both look OK to me, but I notice this part is
> largely duplicated between the two. We already have git_env_bool to do a
> similar thing for boolean environment variables. Should we do something
> similar like:
> 
> diff --git a/config.c b/config.c
> index 058505c..11919eb 100644
> --- a/config.c
> +++ b/config.c
> @@ -1122,6 +1122,14 @@ int git_env_bool(const char *k, int def)
>   return v ? git_config_bool(k, v) : def;
> }
> 
> +unsigned long git_env_ulong(const char *k, unsigned long val)
> +{
> + const char *v = getenv(k);
> + if (v && !git_parse_ulong(k, &val))
> + die("failed to parse %s", k);
> + return val;
> +}
> +
> int git_config_system(void)
> {
>   return !git_env_bool("GIT_CONFIG_NOSYSTEM", 0);
> 
> It's not a lot of code, but I think the callers end up being much easier
> to read:
> 
>  if (limit == SIZE_MAX)
>   limit = git_env_ulong("GIT_ALLOC_LIMIT", 0);

I think you're right.  I'll change it.


Steffen--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-24 Thread Steffen Prohaska
Similar to testing expectations about malloc with GIT_ALLOC_LIMIT (see
commit d41489 and previous commit), it can be useful to test
expectations about mmap.

This introduces a new environment variable GIT_MMAP_LIMIT to limit the
largest allowed mmap length.  xmmap() is modified to check the limit.
Together with GIT_ALLOC_LIMIT tests can now easily confirm expectations
about memory consumption.

GIT_MMAP_LIMIT will be used in the next commit to test that data will be
streamed to an external filter without mmaping the entire file.

[commit d41489]: d41489a6424308dc9a0409bc2f6845aa08bd4f7d Add more large
blob test cases

Signed-off-by: Steffen Prohaska 
---
 sha1_file.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 00c07f2..3204f66 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -663,10 +663,29 @@ void release_pack_memory(size_t need)
; /* nothing */
 }
 
+static void mmap_limit_check(size_t length)
+{
+   static size_t limit = SIZE_MAX;
+   if (limit == SIZE_MAX) {
+   const char *var = "GIT_MMAP_LIMIT";
+   unsigned long val = 0;
+   const char *env = getenv(var);
+   if (env && !git_parse_ulong(env, &val))
+   die("Failed to parse %s", var);
+   limit = val;
+   }
+   if (limit && length > limit)
+   die("attempting to mmap %"PRIuMAX" over limit %"PRIuMAX,
+   (uintmax_t)length, (uintmax_t)limit);
+}
+
 void *xmmap(void *start, size_t length,
int prot, int flags, int fd, off_t offset)
 {
-   void *ret = mmap(start, length, prot, flags, fd, offset);
+   void *ret;
+
+   mmap_limit_check(length);
+   ret = mmap(start, length, prot, flags, fd, offset);
if (ret == MAP_FAILED) {
if (!length)
return NULL;
-- 
2.1.0.8.gd3b6067

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong()

2014-08-24 Thread Steffen Prohaska
Changes since v4: use git_parse_ulong() to parse env vars.

Steffen Prohaska (4):
  convert: Refactor would_convert_to_git() to single arg 'path'
  Change GIT_ALLOC_LIMIT check to use git_parse_ulong()
  Introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  convert: Stream from fd to required clean filter instead of mmap

 convert.c | 60 +--
 convert.h | 10 ++---
 sha1_file.c   | 50 +++---
 t/t0021-conversion.sh | 24 -
 t/t1050-large.sh  |  2 +-
 wrapper.c | 16 --
 6 files changed, 138 insertions(+), 24 deletions(-)

-- 
2.1.0.8.gd3b6067

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()

2014-08-24 Thread Steffen Prohaska
GIT_ALLOC_LIMIT limits xmalloc()'s size, which is of type size_t.
Better use git_parse_ulong() to parse the environment variable, so that
the postfixes 'k', 'm', and 'g' can be used; and use size_t to store the
limit for consistency.  The change to size_t has no direct practical
impact, because we use GIT_ALLOC_LIMIT to test small sizes.

The cast of size in the call to die() is changed to uintmax_t to match
the format string PRIuMAX.

Signed-off-by: Steffen Prohaska 
---
 t/t1050-large.sh |  2 +-
 wrapper.c| 16 ++--
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index aea4936..e7657ab 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -13,7 +13,7 @@ test_expect_success setup '
echo X | dd of=large2 bs=1k seek=2000 &&
echo X | dd of=large3 bs=1k seek=2000 &&
echo Y | dd of=huge bs=1k seek=2500 &&
-   GIT_ALLOC_LIMIT=1500 &&
+   GIT_ALLOC_LIMIT=1500k &&
export GIT_ALLOC_LIMIT
 '
 
diff --git a/wrapper.c b/wrapper.c
index bc1bfb8..69d1c9b 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -11,14 +11,18 @@ static void (*try_to_free_routine)(size_t size) = 
do_nothing;
 
 static void memory_limit_check(size_t size)
 {
-   static int limit = -1;
-   if (limit == -1) {
-   const char *env = getenv("GIT_ALLOC_LIMIT");
-   limit = env ? atoi(env) * 1024 : 0;
+   static size_t limit = SIZE_MAX;
+   if (limit == SIZE_MAX) {
+   const char *var = "GIT_ALLOC_LIMIT";
+   unsigned long val = 0;
+   const char *env = getenv(var);
+   if (env && !git_parse_ulong(env, &val))
+   die("Failed to parse %s", var);
+   limit = val;
}
if (limit && size > limit)
-   die("attempting to allocate %"PRIuMAX" over limit %d",
-   (intmax_t)size, limit);
+   die("attempting to allocate %"PRIuMAX" over limit %"PRIuMAX,
+   (uintmax_t)size, (uintmax_t)limit);
 }
 
 try_to_free_t set_try_to_free_routine(try_to_free_t routine)
-- 
2.1.0.8.gd3b6067

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path'

2014-08-24 Thread Steffen Prohaska
It is only the path that matters in the decision whether to filter or
not.  Clarify this by making path the single argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska 
---
 convert.h   | 5 ++---
 sha1_file.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/convert.h b/convert.h
index 0c2143c..c638b33 100644
--- a/convert.h
+++ b/convert.h
@@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const 
char *src,
   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
  struct strbuf *dst);
-static inline int would_convert_to_git(const char *path, const char *src,
-  size_t len, enum safe_crlf checksafe)
+static inline int would_convert_to_git(const char *path)
 {
-   return convert_to_git(path, src, len, NULL, checksafe);
+   return convert_to_git(path, NULL, 0, NULL, 0);
 }
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..00c07f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (size <= big_file_threshold || type != OBJ_BLOB ||
-(path && would_convert_to_git(path, NULL, 0, 0)))
+(path && would_convert_to_git(path)))
ret = index_core(sha1, fd, size, type, path, flags);
else
ret = index_stream(sha1, fd, size, type, path, flags);
-- 
2.1.0.8.gd3b6067

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap

2014-08-24 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  The original data is not available
to git, so it must fail if the filter fails.

The environment variable GIT_MMAP_LIMIT, which has been introduced in
the previous commit is used to test that the expected code path is
taken.  A related test that exercises required filters is modified to
verify that the data actually has been modified on its way from the file
system to the object store.

Signed-off-by: Steffen Prohaska 
---
 convert.c | 60 +--
 convert.h |  5 +
 sha1_file.c   | 27 ++-
 t/t0021-conversion.sh | 24 -
 4 files changed, 104 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..463f6de 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -325,6 +326,7 @@ static int filter_buffer(int in, int out, void *data)
struct filter_params *params = (struct filter_params *)data;
int write_err, status;
const char *argv[] = { NULL, NULL };
+   int fd;
 
/* apply % substitution to cmd */
struct strbuf cmd = STRBUF_INIT;
@@ -355,7 +357,17 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params->src, params->size) 
< 0);
+   if (params->src) {
+   write_err = (write_in_full(child_process.in, params->src, 
params->size) < 0);
+   } else {
+   /* dup(), because copy_fd() closes the input fd. */
+   fd = dup(params->fd);
+   if (fd < 0)
+   write_err = error("failed to dup file descriptor.");
+   else
+   write_err = copy_fd(fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +383,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +404,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(&async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = ¶ms;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +760,24 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path)
+{
+   struct conv_attrs ca;
+
+   convert_attrs(&ca, path);
+   if (!ca.drv)
+   return 0;
+
+   /* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available.
+*/
+   if (!ca.drv->required)
+   return 0;
+
+   return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +792,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv->required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret && required)
die("%s: clean filter '%s' failed", path, ca.drv->name);
 
@@ -778,6 +809,23 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
return ret | ident_to_git(path, src, len, dst, ca.ide

Re: [PATCH v3 2/3] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-24 Thread Steffen Prohaska
On Aug 22, 2014, at 6:31 PM, Junio C Hamano  wrote:

> Steffen Prohaska  writes:
> 
>>>> +  if (limit == -1) {
>>>> +  const char *env = getenv("GIT_MMAP_LIMIT");
>>>> +  limit = env ? atoi(env) * 1024 : 0;
>> 
>> ... this should then be changed to atol(env), and ... 
> 
> In the real codepath (not debugging aid like this) we try to avoid
> atoi/atol so that we can catch errors like feeding "123Q" and
> parsing it as 123.
> 
> But it is OK to be loose in an debugging aid.  If I were doing
> this code, I actually would call git_parse_ulong() and not
> define it in terms of kilobytes, though.

Thanks for suggesting this.  I wasn't aware of git_parse_ulong().
I think it makes very much sense to use it, even though it's only a
testing aid.

I'll send a PATCH v5 series.

Steffen

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/4] convert: Stream from fd to required clean filter instead of mmap

2014-08-22 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  The original data is not available
to git, so it must fail if the filter fails.

The environment variable GIT_MMAP_LIMIT, which has been introduced in
the previous commit is used to test that the expected code path is
taken.  A related test that exercises required filters is modified to
verify that the data actually has been modified on its way from the file
system to the object store.

Signed-off-by: Steffen Prohaska 
---
 convert.c | 60 +--
 convert.h |  5 +
 sha1_file.c   | 27 ++-
 t/t0021-conversion.sh | 24 -
 4 files changed, 104 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..463f6de 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -325,6 +326,7 @@ static int filter_buffer(int in, int out, void *data)
struct filter_params *params = (struct filter_params *)data;
int write_err, status;
const char *argv[] = { NULL, NULL };
+   int fd;
 
/* apply % substitution to cmd */
struct strbuf cmd = STRBUF_INIT;
@@ -355,7 +357,17 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params->src, params->size) 
< 0);
+   if (params->src) {
+   write_err = (write_in_full(child_process.in, params->src, 
params->size) < 0);
+   } else {
+   /* dup(), because copy_fd() closes the input fd. */
+   fd = dup(params->fd);
+   if (fd < 0)
+   write_err = error("failed to dup file descriptor.");
+   else
+   write_err = copy_fd(fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +383,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +404,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(&async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = ¶ms;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +760,24 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path)
+{
+   struct conv_attrs ca;
+
+   convert_attrs(&ca, path);
+   if (!ca.drv)
+   return 0;
+
+   /* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available.
+*/
+   if (!ca.drv->required)
+   return 0;
+
+   return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +792,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv->required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret && required)
die("%s: clean filter '%s' failed", path, ca.drv->name);
 
@@ -778,6 +809,23 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
return ret | ident_to_git(path, src, len, dst, ca.ide

[PATCH v4 0/4] Stream fd to clean filter, GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT

2014-08-22 Thread Steffen Prohaska
Only changes since PATCH v3: Use ssize_t to store limits.

Steffen Prohaska (4):
  convert: Refactor would_convert_to_git() to single arg 'path'
  Introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  Change GIT_ALLOC_LIMIT check to use ssize_t for consistency
  convert: Stream from fd to required clean filter instead of mmap

 convert.c | 60 +--
 convert.h | 10 ++---
 sha1_file.c   | 46 ---
 t/t0021-conversion.sh | 24 -
 wrapper.c |  8 +++
 5 files changed, 127 insertions(+), 21 deletions(-)

-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-22 Thread Steffen Prohaska
Similar to testing expectations about malloc with GIT_ALLOC_LIMIT (see
commit d41489), it can be useful to test expectations about mmap.

This introduces a new environment variable GIT_MMAP_LIMIT to limit the
largest allowed mmap length (in KB).  xmmap() is modified to check the
limit.  Together with GIT_ALLOC_LIMIT tests can now easily confirm
expectations about memory consumption.

GIT_MMAP_LIMIT will be used in the next commit to test that data will be
streamed to an external filter without mmaping the entire file.

[commit d41489]: d41489a6424308dc9a0409bc2f6845aa08bd4f7d Add more large
blob test cases

Signed-off-by: Steffen Prohaska 
---
 sha1_file.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 00c07f2..603673b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -663,10 +663,25 @@ void release_pack_memory(size_t need)
; /* nothing */
 }
 
+static void mmap_limit_check(size_t length)
+{
+   static ssize_t limit = -1;
+   if (limit == -1) {
+   const char *env = getenv("GIT_MMAP_LIMIT");
+   limit = env ? atol(env) * 1024 : 0;
+   }
+   if (limit && length > limit)
+   die("attempting to mmap %"PRIuMAX" over limit %"PRIuMAX,
+   (uintmax_t)length, (uintmax_t)limit);
+}
+
 void *xmmap(void *start, size_t length,
int prot, int flags, int fd, off_t offset)
 {
-   void *ret = mmap(start, length, prot, flags, fd, offset);
+   void *ret;
+
+   mmap_limit_check(length);
+   ret = mmap(start, length, prot, flags, fd, offset);
if (ret == MAP_FAILED) {
if (!length)
return NULL;
-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/4] convert: Refactor would_convert_to_git() to single arg 'path'

2014-08-22 Thread Steffen Prohaska
It is only the path that matters in the decision whether to filter or
not.  Clarify this by making path the single argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska 
---
 convert.h   | 5 ++---
 sha1_file.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/convert.h b/convert.h
index 0c2143c..c638b33 100644
--- a/convert.h
+++ b/convert.h
@@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const 
char *src,
   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
  struct strbuf *dst);
-static inline int would_convert_to_git(const char *path, const char *src,
-  size_t len, enum safe_crlf checksafe)
+static inline int would_convert_to_git(const char *path)
 {
-   return convert_to_git(path, src, len, NULL, checksafe);
+   return convert_to_git(path, NULL, 0, NULL, 0);
 }
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..00c07f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (size <= big_file_threshold || type != OBJ_BLOB ||
-(path && would_convert_to_git(path, NULL, 0, 0)))
+(path && would_convert_to_git(path)))
ret = index_core(sha1, fd, size, type, path, flags);
else
ret = index_stream(sha1, fd, size, type, path, flags);
-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/4] Change GIT_ALLOC_LIMIT check to use ssize_t for consistency

2014-08-22 Thread Steffen Prohaska
GIT_ALLOC_LIMIT limits xmalloc()'s size, which is of type size_t.  Use
ssize_t to store the limit for consistency.  The change has no direct
practical impact, because we use GIT_ALLOC_LIMIT to test small sizes.

The cast of size in the call to die() is changed to uintmax_t to match
the format string PRIuMAX.

Signed-off-by: Steffen Prohaska 
---
 wrapper.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/wrapper.c b/wrapper.c
index bc1bfb8..e91f6e9 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -11,14 +11,14 @@ static void (*try_to_free_routine)(size_t size) = 
do_nothing;
 
 static void memory_limit_check(size_t size)
 {
-   static int limit = -1;
+   static ssize_t limit = -1;
if (limit == -1) {
const char *env = getenv("GIT_ALLOC_LIMIT");
-   limit = env ? atoi(env) * 1024 : 0;
+   limit = env ? atol(env) * 1024 : 0;
}
if (limit && size > limit)
-   die("attempting to allocate %"PRIuMAX" over limit %d",
-   (intmax_t)size, limit);
+   die("attempting to allocate %"PRIuMAX" over limit %"PRIuMAX,
+   (uintmax_t)size, (uintmax_t)limit);
 }
 
 try_to_free_t set_try_to_free_routine(try_to_free_t routine)
-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-22 Thread Steffen Prohaska

On Aug 22, 2014, at 12:26 AM, Junio C Hamano  wrote:

> Steffen Prohaska  writes:
> 
>> Similar to testing expectations about malloc with GIT_ALLOC_LIMIT (see
>> commit d41489), it can be useful to test expectations about mmap.
>> 
>> This introduces a new environment variable GIT_MMAP_LIMIT to limit the
>> largest allowed mmap length (in KB).  xmmap() is modified to check the
>> limit.  Together with GIT_ALLOC_LIMIT tests can now easily confirm
>> expectations about memory consumption.
>> 
>> GIT_ALLOC_LIMIT will be used in the next commit to test that data will
> 
> I smell the need for s/ALLOC/MMAP/ here, but perhaps you did mean
> ALLOC (I won't know until I check 3/3 ;-)

You are right.


>> diff --git a/sha1_file.c b/sha1_file.c
>> index 00c07f2..88d64c0 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -663,10 +663,25 @@ void release_pack_memory(size_t need)
>>  ; /* nothing */
>> }
>> 
>> +static void mmap_limit_check(size_t length)
>> +{
>> +static int limit = -1;
> 
> Perhaps you want ssize_t here?  I see mmap() as a tool to handle a
> lot more data than a single malloc() typically would ;-) so previous
> mistakes by other people would not be a good excuse.

Maybe; and ...


>> +if (limit == -1) {
>> +const char *env = getenv("GIT_MMAP_LIMIT");
>> +limit = env ? atoi(env) * 1024 : 0;

... this should then be changed to atol(env), and ... 


>> +}
>> +if (limit && length > limit)
>> +die("attempting to mmap %"PRIuMAX" over limit %d",
>> +(intmax_t)length, limit);

... here PRIuMAX and (uintmax_t); (uintmax_t) also for length.

I'll fix it and also GIT_ALLOC_LIMIT.

Steffen


>> +}
>> +
>> void *xmmap(void *start, size_t length,
>>  int prot, int flags, int fd, off_t offset)
>> {
>> -void *ret = mmap(start, length, prot, flags, fd, offset);
>> +void *ret;
>> +
>> +mmap_limit_check(length);
>> +ret = mmap(start, length, prot, flags, fd, offset);
>>  if (ret == MAP_FAILED) {
>>  if (!length)
>>  return NULL;

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] convert: Stream from fd to required clean filter instead of mmap

2014-08-21 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  The original data is not available
to git, so it must fail if the filter fails.

The environment variable GIT_MMAP_LIMIT, which has been introduced in
the previous commit is used to test that the expected code path is
taken.  A related test that exercises required filters is modified to
verify that the data actually has been modified on its way from the file
system to the object store.

Signed-off-by: Steffen Prohaska 
---
 convert.c | 60 +--
 convert.h |  5 +
 sha1_file.c   | 27 ++-
 t/t0021-conversion.sh | 24 -
 4 files changed, 104 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..463f6de 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -325,6 +326,7 @@ static int filter_buffer(int in, int out, void *data)
struct filter_params *params = (struct filter_params *)data;
int write_err, status;
const char *argv[] = { NULL, NULL };
+   int fd;
 
/* apply % substitution to cmd */
struct strbuf cmd = STRBUF_INIT;
@@ -355,7 +357,17 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params->src, params->size) 
< 0);
+   if (params->src) {
+   write_err = (write_in_full(child_process.in, params->src, 
params->size) < 0);
+   } else {
+   /* dup(), because copy_fd() closes the input fd. */
+   fd = dup(params->fd);
+   if (fd < 0)
+   write_err = error("failed to dup file descriptor.");
+   else
+   write_err = copy_fd(fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +383,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +404,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(&async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = ¶ms;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +760,24 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path)
+{
+   struct conv_attrs ca;
+
+   convert_attrs(&ca, path);
+   if (!ca.drv)
+   return 0;
+
+   /* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available.
+*/
+   if (!ca.drv->required)
+   return 0;
+
+   return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +792,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv->required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret && required)
die("%s: clean filter '%s' failed", path, ca.drv->name);
 
@@ -778,6 +809,23 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
return ret | ident_to_git(path, src, len, dst, ca.ide

[PATCH v3 0/3] Stream fd to clean filter, GIT_MMAP_LIMIT

2014-08-21 Thread Steffen Prohaska
I revised the testing approach as discussed.  Patch 2/3 adds GIT_MMAP_LIMIT,
which allows testing of memory expectations together with GIT_ALLOC_LIMIT.

The rest is unchanged compared to v2.

Steffen Prohaska (3):
  convert: Refactor would_convert_to_git() to single arg 'path'
  Introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  convert: Stream from fd to required clean filter instead of mmap

 convert.c | 60 +--
 convert.h | 10 ++---
 sha1_file.c   | 46 ---
 t/t0021-conversion.sh | 24 -
 4 files changed, 123 insertions(+), 17 deletions(-)

-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size

2014-08-21 Thread Steffen Prohaska
Similar to testing expectations about malloc with GIT_ALLOC_LIMIT (see
commit d41489), it can be useful to test expectations about mmap.

This introduces a new environment variable GIT_MMAP_LIMIT to limit the
largest allowed mmap length (in KB).  xmmap() is modified to check the
limit.  Together with GIT_ALLOC_LIMIT tests can now easily confirm
expectations about memory consumption.

GIT_ALLOC_LIMIT will be used in the next commit to test that data will
be streamed to an external filter without mmaping the entire file.

[commit d41489]: d41489a6424308dc9a0409bc2f6845aa08bd4f7d Add more large
blob test cases

Signed-off-by: Steffen Prohaska 
---
 sha1_file.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 00c07f2..88d64c0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -663,10 +663,25 @@ void release_pack_memory(size_t need)
; /* nothing */
 }
 
+static void mmap_limit_check(size_t length)
+{
+   static int limit = -1;
+   if (limit == -1) {
+   const char *env = getenv("GIT_MMAP_LIMIT");
+   limit = env ? atoi(env) * 1024 : 0;
+   }
+   if (limit && length > limit)
+   die("attempting to mmap %"PRIuMAX" over limit %d",
+   (intmax_t)length, limit);
+}
+
 void *xmmap(void *start, size_t length,
int prot, int flags, int fd, off_t offset)
 {
-   void *ret = mmap(start, length, prot, flags, fd, offset);
+   void *ret;
+
+   mmap_limit_check(length);
+   ret = mmap(start, length, prot, flags, fd, offset);
if (ret == MAP_FAILED) {
if (!length)
return NULL;
-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] convert: Refactor would_convert_to_git() to single arg 'path'

2014-08-21 Thread Steffen Prohaska
It is only the path that matters in the decision whether to filter or
not.  Clarify this by making path the single argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska 
---
 convert.h   | 5 ++---
 sha1_file.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/convert.h b/convert.h
index 0c2143c..c638b33 100644
--- a/convert.h
+++ b/convert.h
@@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const 
char *src,
   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
  struct strbuf *dst);
-static inline int would_convert_to_git(const char *path, const char *src,
-  size_t len, enum safe_crlf checksafe)
+static inline int would_convert_to_git(const char *path)
 {
-   return convert_to_git(path, src, len, NULL, checksafe);
+   return convert_to_git(path, NULL, 0, NULL, 0);
 }
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..00c07f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (size <= big_file_threshold || type != OBJ_BLOB ||
-(path && would_convert_to_git(path, NULL, 0, 0)))
+(path && would_convert_to_git(path)))
ret = index_core(sha1, fd, size, type, path, flags);
else
ret = index_stream(sha1, fd, size, type, path, flags);
-- 
2.1.0.6.gb452461

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] convert: Stream from fd to required clean filter instead of mmap

2014-08-19 Thread Steffen Prohaska

On Aug 19, 2014, at 9:53 AM, Jeff King  wrote:

>>> For files >2GB on a 32-bit system (e.g. msysgit), filtering with the
>>> previous code always failed.  Now it works.  I created the patch to
>>> change git from 'fundamentally doesn't handle this' to 'works as
>>> expected'.
>> 
>> I deal with similar problem and added $GIT_ALLOC_LIMIT to test large
>> blob code. Maybe we could add $GIT_MMAP_LIMIT? I don't suppose we call
>> xmmap/xmalloc so often that the extra test would slow git down.
> 
> Yeah, I think I'd prefer that approach. We should mmap _way_ less than
> we malloc, and I do not think the malloc check has caused any problems.

I am going to update the patch accordingly.

Steffen

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] convert: Stream from fd to required clean filter instead of mmap

2014-08-17 Thread Steffen Prohaska

On Aug 17, 2014, at 9:27 AM, Jeff King  wrote:

> On Sat, Aug 16, 2014 at 06:26:08PM +0200, Steffen Prohaska wrote:
> 
>>> Is the 15MB limit supposed to be imposed somewhere or is it just a guide
>>> of how much memory we expect Git to use in this scenario?
>> 
>> The test should confirm that the the file that is added is not mmapped
>> to memory.  The process size should be relatively small independently
>> of the size of the file that is added.  I wanted to keep the file size
>> small.  The chosen sizes worked for me on Mac and Linux.
> 
> Measuring memory usage seems inherently a bit flaky for the test suite.
> It's also a little out of place, as the test suite is generally about
> correctness and outcomes, and this is a performance issue.

For files >2GB on a 32-bit system (e.g. msysgit), filtering with the previous 
code always failed.  Now it works.  I created the patch to change git from 
'fundamentally doesn't handle this' to 'works as expected'.


> Would it make more sense to construct a t/perf test that shows off the
> change? I suppose the run-time change may not be that impressive, but it
> would be cool if t/perf could measure max memory use, too. Then we can
> just compare results between versions, which is enough to detect
> regressions.

I wasn't aware of t/perf.  Thanks for suggesting this.

I agree that testing memory usage might be a bit flaky.  t/perf might indeed be 
a better place.

I'm not yet entirely convinced, though.  I'm wondering whether the proposed 
test would be robust enough with a large enough threshold to keep it in the 
main test suite.

Steffen
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] convert: Stream from fd to required clean filter instead of mmap

2014-08-17 Thread Steffen Prohaska

On Aug 16, 2014, at 7:00 PM, Andreas Schwab  wrote:

> Steffen Prohaska  writes:
> 
>> The test should confirm that the the file that is added is not mmapped to
>> memory.
> 
> RSS doesn't tell you that.  You can mmap a big file without RSS getting
> bigger.

All data are accessed after mapping, so RSS should be fine.  The test always 
did what I expected.

Steffen

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] convert: Stream from fd to required clean filter instead of mmap

2014-08-16 Thread Steffen Prohaska

On Aug 16, 2014, at 12:27 PM, John Keeping  wrote:

>> +test_expect_success HAVE_MAX_MEM_USAGE \
>> +'filtering large input to small output should use little memory' '
>> +git config filter.devnull.clean "cat >/dev/null" &&
>> +git config filter.devnull.required true &&
>> +for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
>> +echo "30MB filter=devnull" >.gitattributes &&
>> +max_mem_usage_is_lt_KB 15000 git add 30MB
>> +'
> 
> This test fails for me:
> 
> -- 8< --
> expecting success: 
>git config filter.devnull.clean "cat >/dev/null" &&
>git config filter.devnull.required true &&
>for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
>echo "30MB filter=devnull" >.gitattributes &&
>max_mem_usage_is_lt_KB 15000 git add 30MB
> 
> Command used too much memory (expected limit 15000KB, actual usage 15808KB).
> not ok 8 - filtering large input to small output should use little memory
> -- >8 --
> 
> This is on Linux 3.16 x86_64 with GCC 4.8.3 and glibc 2.19.  My GCC also
> has "-fstack-protector" and "-D_FORTIFY_SOURCE=2" enabled by default;
> turning those off for Git decreased the memory usage by about 500KB but
> not enough to make the test pass.  Of course, all of the libraries Git
> is linking are also compiled with those flags...
> 
> Is the 15MB limit supposed to be imposed somewhere or is it just a guide
> of how much memory we expect Git to use in this scenario?

The test should confirm that the the file that is added is not mmapped to 
memory.  The process size should be relatively small independently of the size 
of the file that is added.  I wanted to keep the file size small.  The chosen 
sizes worked for me on Mac and Linux.

A simple solution could be to increase the size of the test file and the limit. 
 I'd suggest to squash the diff below.  The test should still run reasonably 
fast with a 50 MB test file.

Junio, shall I send a whole updated patch series?

Steffen


diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 2cb2414..43de87d 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -243,9 +243,9 @@ test_expect_success HAVE_MAX_MEM_USAGE \
 'filtering large input to small output should use little memory' '
git config filter.devnull.clean "cat >/dev/null" &&
git config filter.devnull.required true &&
-   for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
-   echo "30MB filter=devnull" >.gitattributes &&
-   max_mem_usage_is_lt_KB 15000 git add 30MB
+   for i in $(test_seq 1 50); do printf "%1048576d" 1; done >50MB &&
+   echo "50MB filter=devnull" >.gitattributes &&
+   max_mem_usage_is_lt_KB 35000 git add 50MB
 '




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] convert: Refactor would_convert_to_git() to single arg 'path'

2014-08-05 Thread Steffen Prohaska
It is only the path that matters in the decision whether to filter or
not.  Clarify this by making path the single argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska 
---
 convert.h   | 5 ++---
 sha1_file.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/convert.h b/convert.h
index 0c2143c..c638b33 100644
--- a/convert.h
+++ b/convert.h
@@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const 
char *src,
   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
  struct strbuf *dst);
-static inline int would_convert_to_git(const char *path, const char *src,
-  size_t len, enum safe_crlf checksafe)
+static inline int would_convert_to_git(const char *path)
 {
-   return convert_to_git(path, src, len, NULL, checksafe);
+   return convert_to_git(path, NULL, 0, NULL, 0);
 }
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..00c07f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (size <= big_file_threshold || type != OBJ_BLOB ||
-(path && would_convert_to_git(path, NULL, 0, 0)))
+(path && would_convert_to_git(path)))
ret = index_core(sha1, fd, size, type, path, flags);
else
ret = index_stream(sha1, fd, size, type, path, flags);
-- 
2.1.0.rc1.6.gb569bd8

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] Stream fd to clean filter

2014-08-05 Thread Steffen Prohaska
The main difference to the previous version is that I've split off the
refactoring into a separate commit.  The rest is polishing the style.

Steffen Prohaska (2):
  convert: Refactor would_convert_to_git() to single arg 'path'
  convert: Stream from fd to required clean filter instead of mmap

 convert.c | 60 -
 convert.h | 10 +---
 sha1_file.c   | 29 --
 t/t0021-conversion.sh | 68 +++
 4 files changed, 151 insertions(+), 16 deletions(-)

-- 
2.1.0.rc1.6.gb569bd8

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] convert: Stream from fd to required clean filter instead of mmap

2014-08-05 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  The original data is not available
to git, so it must fail if the filter fails.

The test that exercises required filters is modified to verify that the
data actually has been modified on its way from the file system to the
object store.

The expectation on the process size is tested using /usr/bin/time.  An
alternative would have been tcsh, which could be used to print memory
information as follows:

tcsh -c 'set time=(0 "%M"); '

Although the logic could perhaps be simplified with tcsh, I chose to use
'time' to avoid a dependency on tcsh.

Signed-off-by: Steffen Prohaska 
---
 convert.c | 60 -
 convert.h |  5 
 sha1_file.c   | 27 +++-
 t/t0021-conversion.sh | 68 +++
 4 files changed, 148 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..463f6de 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -325,6 +326,7 @@ static int filter_buffer(int in, int out, void *data)
struct filter_params *params = (struct filter_params *)data;
int write_err, status;
const char *argv[] = { NULL, NULL };
+   int fd;
 
/* apply % substitution to cmd */
struct strbuf cmd = STRBUF_INIT;
@@ -355,7 +357,17 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params->src, params->size) 
< 0);
+   if (params->src) {
+   write_err = (write_in_full(child_process.in, params->src, 
params->size) < 0);
+   } else {
+   /* dup(), because copy_fd() closes the input fd. */
+   fd = dup(params->fd);
+   if (fd < 0)
+   write_err = error("failed to dup file descriptor.");
+   else
+   write_err = copy_fd(fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +383,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +404,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(&async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = ¶ms;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +760,24 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path)
+{
+   struct conv_attrs ca;
+
+   convert_attrs(&ca, path);
+   if (!ca.drv)
+   return 0;
+
+   /* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available.
+*/
+   if (!ca.drv->required)
+   return 0;
+
+   return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +792,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv->required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret && required)
die("%s: cle

Re: [PATCH] convert: Stream from fd to required clean filter instead of mmap

2014-08-05 Thread Steffen Prohaska
I'll address the comments about style in a revised patch.


On Aug 4, 2014, at 9:03 PM, Junio C Hamano  wrote:

>> +return apply_filter(path, 0, 0, -1, 0, ca.drv->clean);
> 
> What's the significance of "-1" here?  Does somebody in the
> callchain from apply_filter() check if fd < 0 and act differently
> (not a complaint nor rhetoric question)?

The decision in apply_filter() to return before the real work is
taken on the first 0.  Any value for fd would be ok.  The -1 is
only a reminder that the fd is invalid.

Steffen
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] convert: Stream from fd to required clean filter instead of mmap

2014-08-03 Thread Steffen Prohaska
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  The original data is not available
to git, so it must fail if the filter fails.

The test that exercises required filters is modified to verify that the
data actually has been modified on its way from the file system to the
object store.

The expectation on the process size is tested using /usr/bin/time.  An
alternative would have been tcsh, which could be used to print memory
information as follows:

tcsh -c 'set time=(0 "%M"); '

Although the logic could perhaps be simplified with tcsh, I chose to use
'time' to avoid a dependency on tcsh.

Signed-off-by: Steffen Prohaska 
---
 convert.c | 58 ++-
 convert.h | 10 +---
 sha1_file.c   | 29 --
 t/t0021-conversion.sh | 68 +++
 4 files changed, 149 insertions(+), 16 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..58a516a 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char 
*src, size_t len,
 struct filter_params {
const char *src;
unsigned long size;
+   int fd;
const char *cmd;
const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
/*
 * Spawn cmd and feed the buffer contents through its stdin.
@@ -325,6 +326,7 @@ static int filter_buffer(int in, int out, void *data)
struct filter_params *params = (struct filter_params *)data;
int write_err, status;
const char *argv[] = { NULL, NULL };
+   int fd;
 
/* apply % substitution to cmd */
struct strbuf cmd = STRBUF_INIT;
@@ -355,7 +357,17 @@ static int filter_buffer(int in, int out, void *data)
 
sigchain_push(SIGPIPE, SIG_IGN);
 
-   write_err = (write_in_full(child_process.in, params->src, params->size) 
< 0);
+   if (params->src) {
+   write_err = (write_in_full(child_process.in, params->src, 
params->size) < 0);
+   } else {
+   /* dup(), because copy_fd() closes the input fd. */
+   fd = dup(params->fd);
+   if (fd < 0)
+   write_err = error("failed to dup file descriptor.");
+   else
+   write_err = copy_fd(fd, child_process.in);
+   }
+
if (close(child_process.in))
write_err = 1;
if (write_err)
@@ -371,7 +383,7 @@ static int filter_buffer(int in, int out, void *data)
return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
 struct strbuf *dst, const char *cmd)
 {
/*
@@ -392,11 +404,12 @@ static int apply_filter(const char *path, const char 
*src, size_t len,
return 1;
 
memset(&async, 0, sizeof(async));
-   async.proc = filter_buffer;
+   async.proc = filter_buffer_or_fd;
async.data = ¶ms;
async.out = -1;
params.src = src;
params.size = len;
+   params.fd = fd;
params.cmd = cmd;
params.path = path;
 
@@ -747,6 +760,22 @@ static void convert_attrs(struct conv_attrs *ca, const 
char *path)
}
 }
 
+int would_convert_to_git_filter_fd(const char *path) {
+   struct conv_attrs ca;
+   convert_attrs(&ca, path);
+
+   if (!ca.drv)
+   return 0;
+
+   /* Apply a filter to an fd only if the filter is required to succeed.
+* We must die if the filter fails, because the original data before
+* filtering is not available. */
+   if (!ca.drv->required)
+   return 0;
+
+   return apply_filter(path, 0, 0, -1, 0, ca.drv->clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +790,7 @@ int convert_to_git(const char *path, const char *src, 
size_t len,
required = ca.drv->required;
}
 
-   ret |= apply_filter(path, src, len, dst, filter);
+   ret |= apply_filter(path, src, len, -1, dst, filter);
if (!ret && required)
die("%s: clean filter &#x

[PATCH v2] completion: Handle '!f() { ... }; f' and "!sh -c '...'" aliases

2014-06-12 Thread Steffen Prohaska
'!f() { ... }; f' and "!sh -c ''" are recommended patterns for
declaring more complex aliases (see git wiki [1]).  This commit teaches
the completion to handle them.

When determining which completion to use for an alias, an opening brace
or single quote is now skipped, and the search for a git command is
continued.  For example, the aliases '!f() { git commit ... }' or "!sh
-c 'git commit ...'" now trigger commit completion.  Previously, the
search stopped on the opening brace or quote, and the completion tried
it to determine how to complete, which obviously was useless.

The null command ':' is now skipped, so that it can be used as
a workaround to declare the desired completion style.  For example, the
aliases '!f() { : git commit ; if ...  ' and "!sh -c ': git commit; if
...'" now trigger commit completion.

Shell function declarations now work with or without space before
the parens, i.e. '!f() ...' and '!f () ...' both work.

[1] https://git.wiki.kernel.org/index.php/Aliases

Signed-off-by: Steffen Prohaska 
---
 contrib/completion/git-completion.bash | 10 ++
 t/t9902-completion.sh  | 27 +++
 2 files changed, 37 insertions(+)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 2c59a76..575f8f7 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -21,6 +21,12 @@
 #source ~/.git-completion.sh
 #3) Consider changing your PS1 to also show the current branch,
 #   see git-prompt.sh for details.
+#
+# If you use complex aliases of form '!f() { ... }; f', you can use the null
+# command ':' as the first command in the function body to declare the desired
+# completion style.  For example '!f() { : git commit ; ... }; f' will
+# tell the completion to use commit completion.  This also works with aliases
+# of form "!sh -c '...'".  For example, "!sh -c ': git commit ; ... '".
 
 case "$COMP_WORDBREAKS" in
 *:*) : great ;;
@@ -781,6 +787,10 @@ __git_aliased_command ()
-*) : option ;;
*=*): setting env ;;
git): git itself ;;
+   \(\))   : skip parens of shell function definition ;;
+   {)  : skip start of shell helper function ;;
+   :)  : skip null command ;;
+   \'*): skip opening quote after sh -c ;;
*)
echo "$word"
return
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 2d4beb5..1d1c106 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -550,6 +550,33 @@ test_expect_success 'complete files' '
test_completion "git add mom" "momified"
 '
 
+test_expect_success "completion uses  completion for alias: !sh -c 'git 
 ...'" '
+   test_config alias.co "!sh -c '"'"'git checkout ...'"'"'" &&
+   test_completion "git co m" <<-\EOF
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+'
+
+test_expect_success 'completion uses  completion for alias: !f () { 
VAR=val git  ... }' '
+   test_config alias.co "!f () { VAR=val git checkout ... ; } f" &&
+   test_completion "git co m" <<-\EOF
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+'
+
+test_expect_success 'completion used  completion for alias: !f() { : git 
 ; ... }' '
+   test_config alias.co "!f() { : git checkout ; if ... } f" &&
+   test_completion "git co m" <<-\EOF
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+'
+
 test_expect_failure 'complete with tilde expansion' '
git init tmp && cd tmp &&
test_when_finished "cd .. && rm -rf tmp" &&
-- 
2.0.0.244.g4e8e734

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] completion: Handle '!f() { ... }; f' aliases

2014-06-12 Thread Steffen Prohaska

On Jun 10, 2014, at 7:27 AM, Junio C Hamano  wrote:

> Steffen Prohaska  writes:
> 
> I tend to prefer writing it like so instead:
> 
>sh -c '...' -
> 
> so that I won't clobber "f" (or any other name).  I wonder if you
> can help users of this other pattern as well.

I'll send an updated patch that handles it.


>> +test_expect_success 'completion uses  completion for alias !f() { 
>> VAR=val git  ... }' '
>> +test_config alias.co "!f() { VAR=val git checkout ... ; } f" &&
> 
> Is it only "f" that is completed, or can I spell it using another
> arbitrary token, e.g.
> 
>   test_config alias.co "!co () { git checkout ... } co"

Any token that starts with ! already worked before. 

The updated patch will also handle spaces before the parens.

Steffen--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] completion: Handle '!f() { ... }; f' aliases

2014-06-09 Thread Steffen Prohaska
'!f() { ... }; f' is a recommended pattern to declare more complex
aliases (see git wiki [1]).  This commit teaches the completion to
handle them.

When determining which completion to use for an alias, the opening brace
is now ignored in order to continue the search for a git command inside
the function body.  For example, the alias '!f() { git commit ... }' now
triggers commit completion.  Previously, the search stopped on '{', and
the completion tried it to determine how to complete, which obviously
was useless.

Furthermore, the null command ':' is now skipped, so that it can be used
as a workaround to declare the desired completion style.  For example,
the alias '!f() { : git commit ; if ...  ' now triggers commit
completion.

[1] https://git.wiki.kernel.org/index.php/Aliases

Signed-off-by: Steffen Prohaska 
---

I changed the tests to use test_config, as Eric suggested.  Thanks.

 contrib/completion/git-completion.bash |  7 +++
 t/t9902-completion.sh  | 18 ++
 2 files changed, 25 insertions(+)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 2c59a76..aecb975 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -21,6 +21,11 @@
 #source ~/.git-completion.sh
 #3) Consider changing your PS1 to also show the current branch,
 #   see git-prompt.sh for details.
+#
+# If you use complex aliases of form '!f() { ... }; f', you can use the null
+# command ':' as the first command in the function body to declare the desired
+# completion style.  For example '!f() { : git commit ; ... }; f' will
+# tell the completion to use commit completion.
 
 case "$COMP_WORDBREAKS" in
 *:*) : great ;;
@@ -781,6 +786,8 @@ __git_aliased_command ()
-*) : option ;;
*=*): setting env ;;
git): git itself ;;
+   {)  : skip start of shell helper function ;;
+   :)  : skip null command ;;
*)
echo "$word"
return
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 2d4beb5..10ceb29 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -550,6 +550,24 @@ test_expect_success 'complete files' '
test_completion "git add mom" "momified"
 '
 
+test_expect_success 'completion uses  completion for alias !f() { VAR=val 
git  ... }' '
+   test_config alias.co "!f() { VAR=val git checkout ... ; } f" &&
+   test_completion "git co m" <<-\EOF
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+'
+
+test_expect_success 'completion used  completion for alias !f() { : git 
 ; ... }' '
+   test_config alias.co "!f() { : git checkout ; if ... } f" &&
+   test_completion "git co m" <<-\EOF
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+'
+
 test_expect_failure 'complete with tilde expansion' '
git init tmp && cd tmp &&
test_when_finished "cd .. && rm -rf tmp" &&
-- 
2.0.0.244.g4e8e734

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] completion: Handle '!f() { ... }; f' aliases

2014-06-07 Thread Steffen Prohaska
'!f() { ... }; f' is a recommended pattern to declare more complex
aliases (see git wiki [1]).  This commit teaches the completion to
handle them.

When determining which completion to use for an alias, the opening brace
is now ignored in order to continue the search for a git command inside
the function body.  For example, the alias '!f() { git commit ... }' now
triggers commit completion.  Previously, the search stopped on '{', and
the completion tried it to determine how to complete, which obviously
was useless.

Furthermore, the null command ':' is now skipped, so that it can be used
as a workaround to declare the desired completion style.  For example,
the alias '!f() { : git commit ; if ...  ' now triggers commit
completion.

[1] https://git.wiki.kernel.org/index.php/Aliases

Signed-off-by: Steffen Prohaska 
---
 contrib/completion/git-completion.bash |  7 +++
 t/t9902-completion.sh  | 20 
 2 files changed, 27 insertions(+)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 2c59a76..aecb975 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -21,6 +21,11 @@
 #source ~/.git-completion.sh
 #3) Consider changing your PS1 to also show the current branch,
 #   see git-prompt.sh for details.
+#
+# If you use complex aliases of form '!f() { ... }; f', you can use the null
+# command ':' as the first command in the function body to declare the desired
+# completion style.  For example '!f() { : git commit ; ... }; f' will
+# tell the completion to use commit completion.
 
 case "$COMP_WORDBREAKS" in
 *:*) : great ;;
@@ -781,6 +786,8 @@ __git_aliased_command ()
-*) : option ;;
*=*): setting env ;;
git): git itself ;;
+   {)  : skip start of shell helper function ;;
+   :)  : skip null command ;;
*)
echo "$word"
return
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 2d4beb5..ea48681 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -550,6 +550,26 @@ test_expect_success 'complete files' '
test_completion "git add mom" "momified"
 '
 
+test_expect_success 'completion uses  completion for alias !f() { VAR=val 
git  ... }' '
+   git config alias.co "!f() { VAR=val git checkout ... ; } f" &&
+   test_completion "git co m" <<-\EOF &&
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+   git config --unset alias.co
+'
+
+test_expect_success 'completion used  completion for alias !f() { : git 
 ; ... }' '
+   git config alias.co "!f() { : git checkout ; if ... } f" &&
+   test_completion "git co m" <<-\EOF &&
+   master Z
+   mybranch Z
+   mytag Z
+   EOF
+   git config --unset alias.co
+'
+
 test_expect_failure 'complete with tilde expansion' '
git init tmp && cd tmp &&
test_when_finished "cd .. && rm -rf tmp" &&
-- 
2.0.0.244.g4e8e734

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v1.8.4.1

2013-09-28 Thread Steffen Prohaska
Hello Jonathan,

On Sep 27, 2013, at 8:52 PM, Jonathan Nieder  wrote:

> The latest maintenance release Git v1.8.4.1 is now available.

I can't find the following three minor doc fixes

  http://article.gmane.org/gmane.comp.version-control.git/235234
  http://article.gmane.org/gmane.comp.version-control.git/235233
  http://article.gmane.org/gmane.comp.version-control.git/235255

on any of your branches.  I'm wondering whether they got overlooked when you 
took over from Junio.

Steffen--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] git-prune-packed.txt: Fix reference to GIT_OBJECT_DIRECTORY

2013-09-23 Thread Steffen Prohaska
git-prune-packed operates on GIT_OBJECT_DIRECTORY.  See 'environment.c',

git_object_dir = getenv(DB_ENVIRONMENT);

and cache.h,

#define DB_ENVIRONMENT "GIT_OBJECT_DIRECTORY"

Signed-off-by: Steffen Prohaska 
---
 Documentation/git-prune-packed.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-prune-packed.txt 
b/Documentation/git-prune-packed.txt
index 80dc022..6738055 100644
--- a/Documentation/git-prune-packed.txt
+++ b/Documentation/git-prune-packed.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 
 DESCRIPTION
 ---
-This program searches the `$GIT_OBJECT_DIR` for all objects that currently
+This program searches the `$GIT_OBJECT_DIRECTORY` for all objects that 
currently
 exist in a pack file as well as the independent object directories.
 
 All such extra objects are removed.
-- 
1.8.4.477.gfa286b2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] git.txt: Fix asciidoc syntax of --*-pathspecs

2013-09-23 Thread Steffen Prohaska
Labeled lists require a double colon.

Signed-off-by: Steffen Prohaska 
---
 Documentation/git.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 5d68d33..4c2757e 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -475,19 +475,19 @@ example the following invocations are equivalent:
This is equivalent to setting the `GIT_LITERAL_PATHSPECS` environment
variable to `1`.
 
---glob-pathspecs:
+--glob-pathspecs::
Add "glob" magic to all pathspec. This is equivalent to setting
the `GIT_GLOB_PATHSPECS` environment variable to `1`. Disabling
globbing on individual pathspecs can be done using pathspec
magic ":(literal)"
 
---noglob-pathspecs:
+--noglob-pathspecs::
Add "literal" magic to all pathspec. This is equivalent to setting
the `GIT_NOGLOB_PATHSPECS` environment variable to `1`. Enabling
globbing on individual pathspecs can be done using pathspec
magic ":(glob)"
 
---icase-pathspecs:
+--icase-pathspecs::
Add "icase" magic to all pathspec. This is equivalent to setting
the `GIT_ICASE_PATHSPECS` environment variable to `1`.
 
-- 
1.8.4.477.gfa286b2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X

2013-08-21 Thread Steffen Prohaska
Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

error: read from external filter cat failed
error: cannot feed the input to external filter cat
error: cat died of signal 13
error: external filter cat failed 141
error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte >=
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed earlier [6c642a].

This commit addresses the problem for read() and write() by limiting
size of IO chunks unconditionally on all platforms in xread() and
xwrite().  Large chunks only cause problems, like triggering the OS
X bug or causing latencies when killing the process.  Reasonably sized
smaller chunks have no negative impact on performance.

The compat wrapper clipped_write() introduced earlier [6c642a] is not
needed anymore.  It will be reverted in a separate commit.  The new test
catches read and write problems.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for suggestions and testing:

Johannes Sixt 
John Keeping 
Jonathan Nieder 
Kyle J. McKay 
Linus Torvalds 
Torsten Bögershausen 

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] commit 6c642a878688adf46b226903858b53e2d31ac5c3
compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska 
---
 t/t0021-conversion.sh | 14 ++
 wrapper.c | 12 
 2 files changed, 26 insertions(+)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,18 @@ test_expect_success 'required filter clean failure' '
test_must_fail git add test.fc
 '
 
+test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'filter large file' '
+   git config filter.largefile.smudge cat &&
+   git config filter.largefile.clean cat &&
+   for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
+   echo "2GB filter=largefile" >.gitattributes &&
+   git add 2GB 2>err &&
+   ! test -s err &&
+   rm -f 2GB &&
+   git checkout -- 2GB 2>err &&
+   ! test -s err
+'
+
 test_done
diff --git a/wrapper.c b/wrapper.c
index 6a015de..66cc727 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -131,6 +131,14 @@ void *xcalloc(size_t nmemb, size_t size)
 }
 
 /*
+ * Limit size of IO chunks, because huge chunks only cause pain.  OS X 64-bit
+ * is buggy, returning EINVAL if len >= INT_MAX; and even in the absense of
+ * bugs, large chunks can result in bad latencies when you decide to kill the
+ * process.
+ */
+#define MAX_IO_SIZE (8*1024*1024)
+
+/*
  * xread() is the same a read(), but it automatically restarts read()
  * operations with a recoverable error (EAGAIN and EINTR). xread()
  * DOES NOT GUARANTEE that "len" bytes is read even if the data is available.
@@ -138,6 +146,8 @@ void *xcalloc(size_t nmemb, size_t size)
 ssize_t xread(int fd, void *buf, size_t len)
 {
ssize_t nr;
+   if (len > MAX_IO_SIZE)
+   len = MAX_IO_SIZE;
while (1) {
nr = read(fd, buf, len);
if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
@@ -154,6 +164,8 @@ ssize_t xread(int fd, void *buf, size_t len)
 ssize_t xwrite(int fd, const void *buf, size_t len)
 {
ssize_t nr;
+   if (len > MAX_IO_SIZE)
+   len = MAX_IO_SIZE;
while (1) {
nr = write(fd, buf, len);
if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU"

2013-08-21 Thread Steffen Prohaska
The previous commit introduced a size limit on IO chunks on all
platforms.  The compat clipped_write() is not needed anymore.

This reverts commit 6c642a878688adf46b226903858b53e2d31ac5c3.

Signed-off-by: Steffen Prohaska 
---
 Makefile   |  8 
 compat/clipped-write.c | 13 -
 config.mak.uname   |  1 -
 git-compat-util.h  |  5 -
 4 files changed, 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

diff --git a/Makefile b/Makefile
index 3588ca1..4026211 100644
--- a/Makefile
+++ b/Makefile
@@ -69,9 +69,6 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
-# Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
-# INT_MAX bytes at once (e.g. MacOS X).
-#
 # Define HAVE_PATHS_H if you have paths.h and want to use the default PATH
 # it specifies.
 #
@@ -1493,11 +1490,6 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
MSGFMT += --check --statistics
 endif
 
-ifdef NEEDS_CLIPPED_WRITE
-   BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
-   COMPAT_OBJS += compat/clipped-write.o
-endif
-
 ifneq (,$(XDL_FAST_HASH))
BASIC_CFLAGS += -DXDL_FAST_HASH
 endif
diff --git a/compat/clipped-write.c b/compat/clipped-write.c
deleted file mode 100644
index b8f98ff..000
--- a/compat/clipped-write.c
+++ /dev/null
@@ -1,13 +0,0 @@
-#include "../git-compat-util.h"
-#undef write
-
-/*
- * Version of write that will write at most INT_MAX bytes.
- * Workaround a xnu bug on Mac OS X
- */
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte)
-{
-   if (nbyte > INT_MAX)
-   nbyte = INT_MAX;
-   return write(fildes, buf, nbyte);
-}
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..7d61531 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,7 +95,6 @@ ifeq ($(uname_S),Darwin)
NO_MEMMEM = YesPlease
USE_ST_TIMESPEC = YesPlease
HAVE_DEV_TTY = YesPlease
-   NEEDS_CLIPPED_WRITE = YesPlease
COMPAT_OBJS += compat/precompose_utf8.o
BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
 endif
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..96d8881 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,11 +185,6 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
-#ifdef NEEDS_CLIPPED_WRITE
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
-#define write(x,y,z) clipped_write((x),(y),(z))
-#endif
-
 #ifdef MKDIR_WO_TRAILING_SLASH
 #define mkdir(a,b) compat_mkdir_wo_trailing_slash((a),(b))
 extern int compat_mkdir_wo_trailing_slash(const char*, mode_t);
-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo

2013-08-21 Thread Steffen Prohaska
Fixed typo in comment.

Steffen Prohaska (2):
  xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X
  Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU"

 Makefile   |  8 
 compat/clipped-write.c | 13 -
 config.mak.uname   |  1 -
 git-compat-util.h  |  5 -
 t/t0021-conversion.sh  | 14 ++
 wrapper.c  | 12 
 6 files changed, 26 insertions(+), 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/2] Fix IO of >=2GB on Mac OS X by limiting IO chunks

2013-08-19 Thread Steffen Prohaska
This is the revised patch taking the considerations about IO chunk size into
account.  The series deletes more than it adds and fixes a bug.  Nice.

Steffen Prohaska (2):
  xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X
  Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU"

 Makefile   |  8 
 compat/clipped-write.c | 13 -
 config.mak.uname   |  1 -
 git-compat-util.h  |  5 -
 t/t0021-conversion.sh  | 14 ++
 wrapper.c  | 12 
 6 files changed, 26 insertions(+), 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska
Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

error: read from external filter cat failed
error: cannot feed the input to external filter cat
error: cat died of signal 13
error: external filter cat failed 141
error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte >=
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed earlier [6c642a].

This commit addresses the problem for read() and write() by limiting
size of IO chunks unconditionally on all platforms in xread() and
xwrite().  Large chunks only cause problems, like triggering the OS
X bug or causing latencies when killing the process.  Reasonably sized
smaller chunks have no negative impact on performance.

The compat wrapper clipped_write() introduced earlier [6c642a] is not
needed anymore.  It will be reverted in a separate commit.  The new test
catches read and write problems.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for suggestions and testing:

Johannes Sixt 
John Keeping 
Jonathan Nieder 
Kyle J. McKay 
Linus Torvalds 
Torsten Bögershausen 

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] commit 6c642a878688adf46b226903858b53e2d31ac5c3
compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska 
---
 t/t0021-conversion.sh | 14 ++
 wrapper.c | 12 
 2 files changed, 26 insertions(+)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,18 @@ test_expect_success 'required filter clean failure' '
test_must_fail git add test.fc
 '
 
+test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'filter large file' '
+   git config filter.largefile.smudge cat &&
+   git config filter.largefile.clean cat &&
+   for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
+   echo "2GB filter=largefile" >.gitattributes &&
+   git add 2GB 2>err &&
+   ! test -s err &&
+   rm -f 2GB &&
+   git checkout -- 2GB 2>err &&
+   ! test -s err
+'
+
 test_done
diff --git a/wrapper.c b/wrapper.c
index 6a015de..97e3cf7 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -131,6 +131,14 @@ void *xcalloc(size_t nmemb, size_t size)
 }
 
 /*
+ * Limit size of IO chunks, because huge chunks only cause pain.  OS X 64-bit
+ * buggy, returning EINVAL if len >= INT_MAX; and even in the absense of bugs,
+ * large chunks can result in bad latencies when you decide to kill the
+ * process.
+ */
+#define MAX_IO_SIZE (8*1024*1024)
+
+/*
  * xread() is the same a read(), but it automatically restarts read()
  * operations with a recoverable error (EAGAIN and EINTR). xread()
  * DOES NOT GUARANTEE that "len" bytes is read even if the data is available.
@@ -138,6 +146,8 @@ void *xcalloc(size_t nmemb, size_t size)
 ssize_t xread(int fd, void *buf, size_t len)
 {
ssize_t nr;
+   if (len > MAX_IO_SIZE)
+   len = MAX_IO_SIZE;
while (1) {
nr = read(fd, buf, len);
if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
@@ -154,6 +164,8 @@ ssize_t xread(int fd, void *buf, size_t len)
 ssize_t xwrite(int fd, const void *buf, size_t len)
 {
ssize_t nr;
+   if (len > MAX_IO_SIZE)
+   len = MAX_IO_SIZE;
while (1) {
nr = write(fd, buf, len);
if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU"

2013-08-19 Thread Steffen Prohaska
The previous commit introduced a size limit on IO chunks on all
platforms.  The compat clipped_write() is not needed anymore.

This reverts commit 6c642a878688adf46b226903858b53e2d31ac5c3.

Signed-off-by: Steffen Prohaska 
---
 Makefile   |  8 
 compat/clipped-write.c | 13 -
 config.mak.uname   |  1 -
 git-compat-util.h  |  5 -
 4 files changed, 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

diff --git a/Makefile b/Makefile
index 3588ca1..4026211 100644
--- a/Makefile
+++ b/Makefile
@@ -69,9 +69,6 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
-# Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
-# INT_MAX bytes at once (e.g. MacOS X).
-#
 # Define HAVE_PATHS_H if you have paths.h and want to use the default PATH
 # it specifies.
 #
@@ -1493,11 +1490,6 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
MSGFMT += --check --statistics
 endif
 
-ifdef NEEDS_CLIPPED_WRITE
-   BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
-   COMPAT_OBJS += compat/clipped-write.o
-endif
-
 ifneq (,$(XDL_FAST_HASH))
BASIC_CFLAGS += -DXDL_FAST_HASH
 endif
diff --git a/compat/clipped-write.c b/compat/clipped-write.c
deleted file mode 100644
index b8f98ff..000
--- a/compat/clipped-write.c
+++ /dev/null
@@ -1,13 +0,0 @@
-#include "../git-compat-util.h"
-#undef write
-
-/*
- * Version of write that will write at most INT_MAX bytes.
- * Workaround a xnu bug on Mac OS X
- */
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte)
-{
-   if (nbyte > INT_MAX)
-   nbyte = INT_MAX;
-   return write(fildes, buf, nbyte);
-}
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..7d61531 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,7 +95,6 @@ ifeq ($(uname_S),Darwin)
NO_MEMMEM = YesPlease
USE_ST_TIMESPEC = YesPlease
HAVE_DEV_TTY = YesPlease
-   NEEDS_CLIPPED_WRITE = YesPlease
COMPAT_OBJS += compat/precompose_utf8.o
BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
 endif
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..96d8881 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,11 +185,6 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
-#ifdef NEEDS_CLIPPED_WRITE
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
-#define write(x,y,z) clipped_write((x),(y),(z))
-#endif
-
 #ifdef MKDIR_WO_TRAILING_SLASH
 #define mkdir(a,b) compat_mkdir_wo_trailing_slash((a),(b))
 extern int compat_mkdir_wo_trailing_slash(const char*, mode_t);
-- 
1.8.4.rc3.5.g4f480ff

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska

On Aug 19, 2013, at 6:04 PM, Linus Torvalds  
wrote:

> I hate your patch for other reasons, though:
> 
>> The problem for read() is addressed in a similar way by introducing
>> a wrapper function in compat that always reads less than 2GB.
> 
> Why do you do that? We already _have_ wrapper functions for read(),
> namely xread().  Exactly because you basically have to, in order to
> handle signals on interruptible filesystems (which aren't POSIX
> either, but at least sanely so) or from other random sources. And to
> handle the "you can't do reads that big" issue.
> 
> So why isn't the patch much more straightforward? 

The first version was more straightforward [1].  But reviewers suggested
that the compat wrappers would be the right way to do it and showed me
that it has been done like this before [2].

I haven't submitted anything in a while, so I tried to be a kind person
and followed the suggestions.  I started to hate the patch a bit (maybe less
than you), but I wasn't brave enough to reject the suggestions.  This is
why the patch became what it is.

I'm happy to rework it again towards your suggestion.  I would also remove
the compat wrapper for write().  But I got a bit tired.  I'd appreciate if
I received more indication whether a version without compat wrappers would
be accepted.

Steffen

[1] http://article.gmane.org/gmane.comp.version-control.git/232455
[2] 6c642a8 compate/clipped-write.c: large write(2) fails on Mac OS X/XNU--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska
Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

error: read from external filter cat failed
error: cannot feed the input to external filter cat
error: cat died of signal 13
error: external filter cat failed 141
error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte >=
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed in a earlier
commit [6c642a].

The problem for read() is addressed in a similar way by introducing
a wrapper function in compat that always reads less than 2GB.  It is
very likely that the read() and write() wrappers are always used
together.  To avoid introducing another option, NEEDS_CLIPPED_WRITE is
changed to NEEDS_CLIPPED_IO and used to activate both wrappers.

To avoid expanding the read compat macro in constructs like
'vtbl->read(...)', 'read' is renamed to 'readfn' in two cases.  The
solution seems more robust than using '#undef read'.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for their suggestions:

Johannes Sixt 
John Keeping 
Jonathan Nieder 
Kyle J. McKay 
Torsten Bögershausen 
Eric Sunshine 

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] 6c642a878688adf46b226903858b53e2d31ac5c3
compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska 
---
 Makefile | 10 +-
 builtin/var.c| 10 +-
 compat/{clipped-write.c => clipped-io.c} | 11 ++-
 config.mak.uname |  2 +-
 git-compat-util.h|  5 -
 streaming.c  |  4 ++--
 t/t0021-conversion.sh| 14 ++
 7 files changed, 41 insertions(+), 15 deletions(-)
 rename compat/{clipped-write.c => clipped-io.c} (53%)

diff --git a/Makefile b/Makefile
index 3588ca1..f54134d 100644
--- a/Makefile
+++ b/Makefile
@@ -69,8 +69,8 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
-# Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
-# INT_MAX bytes at once (e.g. MacOS X).
+# Define NEEDS_CLIPPED_IO if your read(2) and/or write(2) cannot handle more
+# than INT_MAX bytes at once (e.g. Mac OS X).
 #
 # Define HAVE_PATHS_H if you have paths.h and want to use the default PATH
 # it specifies.
@@ -1493,9 +1493,9 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
MSGFMT += --check --statistics
 endif
 
-ifdef NEEDS_CLIPPED_WRITE
-   BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
-   COMPAT_OBJS += compat/clipped-write.o
+ifdef NEEDS_CLIPPED_IO
+   BASIC_CFLAGS += -DNEEDS_CLIPPED_IO
+   COMPAT_OBJS += compat/clipped-io.o
 endif
 
 ifneq (,$(XDL_FAST_HASH))
diff --git a/builtin/var.c b/builtin/var.c
index aedbb53..06f8459 100644
--- a/builtin/var.c
+++ b/builtin/var.c
@@ -28,7 +28,7 @@ static const char *pager(int flag)
 
 struct git_var {
const char *name;
-   const char *(*read)(int);
+   const char *(*readfn)(int);
 };
 static struct git_var git_vars[] = {
{ "GIT_COMMITTER_IDENT", git_committer_info },
@@ -43,8 +43,8 @@ static void list_vars(void)
struct git_var *ptr;
const char *val;
 
-   for (ptr = git_vars; ptr->read; ptr++)
-   if ((val = ptr->read(0)))
+   for (ptr = git_vars; ptr->readfn; ptr++)
+   if ((val = ptr->readfn(0)))
printf("%s=%s\n", ptr->name, val);
 }
 
@@ -53,9 +53,9 @@ static const char *read_var(const char *var)
struct git_var *ptr;
const char *val;
val = NULL;
-   for (ptr = git_vars; ptr->read; ptr++) {
+   for (ptr = git_vars; ptr->readfn; ptr++) {
if (strcmp(var, ptr->name) == 0) {
-   val = ptr->read(IDENT_STRICT);
+   val = ptr->readfn(IDENT_STRICT);
break;
}
}
diff --git a/compat/clipped-write.c b/compat/clipped-io.c
similarity index 53%
rename fr

Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska

On Aug 19, 2013, at 10:20 AM, Johannes Sixt  wrote:

> Am 19.08.2013 08:38, schrieb Steffen Prohaska:
>> +test_expect_success EXPENSIVE 'filter large file' '
>> +git config filter.largefile.smudge cat &&
>> +git config filter.largefile.clean cat &&
>> +for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
> 
> Shouldn't you count to 2049 to get a file that is over 2GB?

No.  INT_MAX = 2GB - 1 works.  INT_MAX + 1 = 2GB fails.  It tests exactly at 
the boundary.

Steffen

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska
Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

error: read from external filter cat failed
error: cannot feed the input to external filter cat
error: cat died of signal 13
error: external filter cat failed 141
error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte >=
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed in a earlier
commit [6c642a].

The problem for read() is addressed in a similar way by introducing
a wrapper function in compat that always reads less than 2GB.
Unfortunately, '#undef read' is needed at a few places to avoid
expanding the compat macro in constructs like 'vtbl->read(...)'.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for their suggestions:

Johannes Sixt 
John Keeping 
Jonathan Nieder 
Kyle J. McKay 
Torsten Bögershausen 

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] 6c642a878688adf46b226903858b53e2d31ac5c3
compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska 
---
 Makefile  |  8 
 builtin/var.c |  1 +
 compat/clipped-read.c | 13 +
 config.mak.uname  |  1 +
 git-compat-util.h |  5 +
 streaming.c   |  1 +
 t/t0021-conversion.sh | 14 ++
 7 files changed, 43 insertions(+)
 create mode 100644 compat/clipped-read.c

diff --git a/Makefile b/Makefile
index 3588ca1..0f69e24 100644
--- a/Makefile
+++ b/Makefile
@@ -69,6 +69,9 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
+# Define NEEDS_CLIPPED_READ if your read(2) cannot read more than
+# INT_MAX bytes at once (e.g. MacOS X).
+#
 # Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
 # INT_MAX bytes at once (e.g. MacOS X).
 #
@@ -1493,6 +1496,11 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
MSGFMT += --check --statistics
 endif
 
+ifdef NEEDS_CLIPPED_READ
+   BASIC_CFLAGS += -DNEEDS_CLIPPED_READ
+   COMPAT_OBJS += compat/clipped-read.o
+endif
+
 ifdef NEEDS_CLIPPED_WRITE
BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
COMPAT_OBJS += compat/clipped-write.o
diff --git a/builtin/var.c b/builtin/var.c
index aedbb53..e59f5ba 100644
--- a/builtin/var.c
+++ b/builtin/var.c
@@ -38,6 +38,7 @@ static struct git_var git_vars[] = {
{ "", NULL },
 };
 
+#undef read
 static void list_vars(void)
 {
struct git_var *ptr;
diff --git a/compat/clipped-read.c b/compat/clipped-read.c
new file mode 100644
index 000..6962f67
--- /dev/null
+++ b/compat/clipped-read.c
@@ -0,0 +1,13 @@
+#include "../git-compat-util.h"
+#undef read
+
+/*
+ * Version of read that will write at most INT_MAX bytes.
+ * Workaround a xnu bug on Mac OS X
+ */
+ssize_t clipped_read(int fd, void *buf, size_t nbyte)
+{
+   if (nbyte > INT_MAX)
+   nbyte = INT_MAX;
+   return read(fd, buf, nbyte);
+}
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..5c10726 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,6 +95,7 @@ ifeq ($(uname_S),Darwin)
NO_MEMMEM = YesPlease
USE_ST_TIMESPEC = YesPlease
HAVE_DEV_TTY = YesPlease
+   NEEDS_CLIPPED_READ = YesPlease
NEEDS_CLIPPED_WRITE = YesPlease
COMPAT_OBJS += compat/precompose_utf8.o
BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..a227127 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,6 +185,11 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
+#ifdef NEEDS_CLIPPED_READ
+ssize_t clipped_read(int fd, void *buf, size_t nbyte);
+#define read(x,y,z) clipped_read((x),(y),(z))
+#endif
+
 #ifdef NEEDS_CLIPPED_WRITE
 ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
 #define write(x,y,z) clipped_write((x),(y),(z))
diff --git a/streaming.c b/streaming.c
index debe904..c1fe34a 100644
--- a/streaming.c
+++ b/streaming.c
@@ -99,6 +99,7 @@ int close_istream(struct git_istream *st)
return r;
 }
 
+#und

Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X

2013-08-19 Thread Steffen Prohaska

On Aug 19, 2013, at 9:54 AM, John Keeping  wrote:

> You've created compat/clipped-read.c, but...
> 
>> Makefile  |  8 
>> builtin/var.c |  1 +
>> config.mak.uname  |  1 +
>> git-compat-util.h |  5 +
>> streaming.c   |  1 +
>> t/t0021-conversion.sh | 14 ++
>> 6 files changed, 30 insertions(+)
> 
> ... it's not included here.  Did you forget to "git add"?

Indeed.  How embarrassing.  Thanks for spotting this.  I'll send v3 in a minute.

Stefffen--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] compat: Fix read() of 2GB and more on Mac OS X

2013-08-18 Thread Steffen Prohaska
Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

error: read from external filter cat failed
error: cannot feed the input to external filter cat
error: cat died of signal 13
error: external filter cat failed 141
error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte >=
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed in a earlier
commit [6c642a].

The problem for read() is addressed in a similar way by introducing
a wrapper function in compat that always reads less than 2GB.
Unfortunately, '#undef read' is needed at a few places to avoid
expanding the compat macro in constructs like 'vtbl->read(...)'.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for their suggestions:

Johannes Sixt 
John Keeping 
Jonathan Nieder 
Kyle J. McKay 
Torsten Bögershausen 

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] 6c642a878688adf46b226903858b53e2d31ac5c3
compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska 
---
 Makefile  |  8 
 builtin/var.c |  1 +
 config.mak.uname  |  1 +
 git-compat-util.h |  5 +
 streaming.c   |  1 +
 t/t0021-conversion.sh | 14 ++
 6 files changed, 30 insertions(+)

diff --git a/Makefile b/Makefile
index 3588ca1..0f69e24 100644
--- a/Makefile
+++ b/Makefile
@@ -69,6 +69,9 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
+# Define NEEDS_CLIPPED_READ if your read(2) cannot read more than
+# INT_MAX bytes at once (e.g. MacOS X).
+#
 # Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
 # INT_MAX bytes at once (e.g. MacOS X).
 #
@@ -1493,6 +1496,11 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
MSGFMT += --check --statistics
 endif
 
+ifdef NEEDS_CLIPPED_READ
+   BASIC_CFLAGS += -DNEEDS_CLIPPED_READ
+   COMPAT_OBJS += compat/clipped-read.o
+endif
+
 ifdef NEEDS_CLIPPED_WRITE
BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
COMPAT_OBJS += compat/clipped-write.o
diff --git a/builtin/var.c b/builtin/var.c
index aedbb53..e59f5ba 100644
--- a/builtin/var.c
+++ b/builtin/var.c
@@ -38,6 +38,7 @@ static struct git_var git_vars[] = {
{ "", NULL },
 };
 
+#undef read
 static void list_vars(void)
 {
struct git_var *ptr;
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..5c10726 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,6 +95,7 @@ ifeq ($(uname_S),Darwin)
NO_MEMMEM = YesPlease
USE_ST_TIMESPEC = YesPlease
HAVE_DEV_TTY = YesPlease
+   NEEDS_CLIPPED_READ = YesPlease
NEEDS_CLIPPED_WRITE = YesPlease
COMPAT_OBJS += compat/precompose_utf8.o
BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..a227127 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,6 +185,11 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
+#ifdef NEEDS_CLIPPED_READ
+ssize_t clipped_read(int fd, void *buf, size_t nbyte);
+#define read(x,y,z) clipped_read((x),(y),(z))
+#endif
+
 #ifdef NEEDS_CLIPPED_WRITE
 ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
 #define write(x,y,z) clipped_write((x),(y),(z))
diff --git a/streaming.c b/streaming.c
index debe904..c1fe34a 100644
--- a/streaming.c
+++ b/streaming.c
@@ -99,6 +99,7 @@ int close_istream(struct git_istream *st)
return r;
 }
 
+#undef read
 ssize_t read_istream(struct git_istream *st, void *buf, size_t sz)
 {
return st->vtbl->read(st, buf, sz);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,18 @@ test_expect_success 'required filter clean failure' '
test_must_fail git add test.fc
 '
 
+test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'filter large file' '
+   git config filter

[PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X

2013-08-17 Thread Steffen Prohaska
Previously, filtering more than 2GB through an external filter (see
test) failed on Mac OS X 10.8.4 (12E55) with:

error: read from external filter cat failed
error: cannot feed the input to external filter cat
error: cat died of signal 13
error: external filter cat failed 141
error: external filter cat failed

The reason is that read() immediately returns with EINVAL if len >= 2GB.
I haven't found any information under which specific conditions this
occurs.  My suspicion is that it happens when reading from a pipe, while
reading from a standard file should always be fine.  I haven't tested
any other version of Mac OS X, though I'd expect that other versions are
affected as well.

The problem is fixed by always reading less than 2GB in xread().
xread() doesn't guarantee to read all the requested data at once, and
callers are expected to gracefully handle partial reads.  Slicing large
reads into 2GB pieces should not hurt practical performance.

Signed-off-by: Steffen Prohaska 
---
 t/t0021-conversion.sh | 9 +
 wrapper.c | 8 
 2 files changed, 17 insertions(+)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..aec1253 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,13 @@ test_expect_success 'required filter clean failure' '
test_must_fail git add test.fc
 '
 
+test_expect_success 'filter large file' '
+   git config filter.largefile.smudge cat &&
+   git config filter.largefile.clean cat &&
+   dd if=/dev/zero of=2GB count=2097152 bs=1024 &&
+   echo "/2GB filter=largefile" >.gitattributes &&
+   git add 2GB 2>err &&
+   ! grep -q "error" err
+'
+
 test_done
diff --git a/wrapper.c b/wrapper.c
index 6a015de..2a2f496 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len)
 {
ssize_t nr;
while (1) {
+#ifdef __APPLE__
+   const size_t twoGB = (1l << 31);
+   /* len >= 2GB immediately fails on Mac OS X with EINVAL when
+* reading from pipe. */
+   if (len >= twoGB) {
+   len = twoGB - 1;
+   }
+#endif
nr = read(fd, buf, len);
if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
continue;
-- 
1.8.4.rc3.5.gfcb973a

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] shortlog: Fix wrapping lines of wraplen (was broken since recent off-by-one fix)

2012-12-10 Thread Steffen Prohaska
A recent commit [1] fixed a off-by-one wrapping error.  As
a side-effect, the conditional in add_wrapped_shortlog_msg() whether to
append a newline needs to be removed.  add_wrapped_shortlog_msg() should
always append a newline, which was the case before the off-by-one fix,
because strbuf_add_wrapped_text() never returned a value of wraplen.

[1] 14e1a4e1ff70aff36db3f5d2a8b806efd0134d50 utf8: fix off-by-one
wrapping of text

Signed-off-by: Steffen Prohaska 
---
 builtin/shortlog.c  |  5 ++---
 t/t4201-shortlog.sh | 24 
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index b316cf3..8360514 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -306,9 +306,8 @@ parse_done:
 static void add_wrapped_shortlog_msg(struct strbuf *sb, const char *s,
 const struct shortlog *log)
 {
-   int col = strbuf_add_wrapped_text(sb, s, log->in1, log->in2, log->wrap);
-   if (col != log->wrap)
-   strbuf_addch(sb, '\n');
+   strbuf_add_wrapped_text(sb, s, log->in1, log->in2, log->wrap);
+   strbuf_addch(sb, '\n');
 }
 
 void shortlog_output(struct shortlog *log)
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 6872ba1..02ac978 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -120,6 +120,30 @@ test_expect_success 'shortlog from non-git directory' '
test_cmp expect out
 '
 
+test_expect_success 'shortlog should add newline when input line matches 
wraplen' '
+   cat >expect <<\EOF &&
+A U Thor (2):
+  bb:  bbb  bbb bb  bbb b bb
+  aa: aa aa   aa  aa aaa
+
+EOF
+   git shortlog -w >out <<\EOF &&
+commit 0001
+Author: A U Thor 
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+aa: aa aa   aa  aa aaa
+
+commit 0002
+Author: A U Thor 
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+bb:  bbb  bbb bb  bbb b bb
+
+EOF
+   test_cmp expect out
+'
+
 iconvfromutf8toiso88591() {
printf "%s" "$*" | iconv -f UTF-8 -t ISO8859-1
 }
-- 
1.8.1.rc1.2.gfb98a3a

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Re: [PATCH] shortlog: Fix wrapping lines of wraplen

2012-12-10 Thread Steffen Prohaska
On Dec 9, 2012, at 10:36 AM, Junio C Hamano  wrote:

> Steffen Prohaska  writes:
> 
> > A recent commit [1] fixed a off-by-one wrapping error.  As
> > a side-effect, add_wrapped_shortlog_msg() needs to be changed to always
> > append a newline.
> 
> Could you clarify "As a side effect" a bit more?  Do you mean
> something like this?

See updated patches below.

Steffen

Steffen Prohaska (2):
  shortlog: Fix wrapping lines of wraplen (was broken since recent
off-by-one fix)
  strbuf_add_wrapped*(): Remove unused return value

 builtin/shortlog.c  |  5 ++---
 t/t4201-shortlog.sh | 24 
 utf8.c  | 13 ++---
 utf8.h  |  4 ++--
 4 files changed, 34 insertions(+), 12 deletions(-)

-- 
1.8.1.rc1.2.gfb98a3a

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] strbuf_add_wrapped*(): Remove unused return value

2012-12-10 Thread Steffen Prohaska
Since shortlog isn't using the return value anymore (see previous
commit), the functions can be changed to void.

Signed-off-by: Steffen Prohaska 
---
 utf8.c | 13 ++---
 utf8.h |  4 ++--
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/utf8.c b/utf8.c
index 5c61bbe..a4ee665 100644
--- a/utf8.c
+++ b/utf8.c
@@ -323,7 +323,7 @@ static size_t display_mode_esc_sequence_len(const char *s)
  * If indent is negative, assume that already -indent columns have been
  * consumed (and no extra indent is necessary for the first line).
  */
-int strbuf_add_wrapped_text(struct strbuf *buf,
+void strbuf_add_wrapped_text(struct strbuf *buf,
const char *text, int indent1, int indent2, int width)
 {
int indent, w, assume_utf8 = 1;
@@ -332,7 +332,7 @@ int strbuf_add_wrapped_text(struct strbuf *buf,
 
if (width <= 0) {
strbuf_add_indented_text(buf, text, indent1, indent2);
-   return 1;
+   return;
}
 
 retry:
@@ -356,14 +356,14 @@ retry:
if (w <= width || !space) {
const char *start = bol;
if (!c && text == start)
-   return w;
+   return;
if (space)
start = space;
else
strbuf_addchars(buf, ' ', indent);
strbuf_add(buf, start, text - start);
if (!c)
-   return w;
+   return;
space = text;
if (c == '\t')
w |= 0x07;
@@ -405,13 +405,12 @@ new_line:
}
 }
 
-int strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
+void strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
 int indent, int indent2, int width)
 {
char *tmp = xstrndup(data, len);
-   int r = strbuf_add_wrapped_text(buf, tmp, indent, indent2, width);
+   strbuf_add_wrapped_text(buf, tmp, indent, indent2, width);
free(tmp);
-   return r;
 }
 
 int is_encoding_utf8(const char *name)
diff --git a/utf8.h b/utf8.h
index 93ef600..a214238 100644
--- a/utf8.h
+++ b/utf8.h
@@ -9,9 +9,9 @@ int is_utf8(const char *text);
 int is_encoding_utf8(const char *name);
 int same_encoding(const char *, const char *);
 
-int strbuf_add_wrapped_text(struct strbuf *buf,
+void strbuf_add_wrapped_text(struct strbuf *buf,
const char *text, int indent, int indent2, int width);
-int strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
+void strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
 int indent, int indent2, int width);
 
 #ifndef NO_ICONV
-- 
1.8.1.rc1.2.gfb98a3a

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] shortlog: Fix wrapping lines of wraplen (was broken since recent off-by-one fix)

2012-12-08 Thread Steffen Prohaska
A recent commit [1] fixed a off-by-one wrapping error.  As
a side-effect, add_wrapped_shortlog_msg() needs to be changed to always
append a newline.

[1] 14e1a4e1ff70aff36db3f5d2a8b806efd0134d50 utf8: fix off-by-one
wrapping of text

Signed-off-by: Steffen Prohaska 
---
 builtin/shortlog.c  |  3 +--
 t/t4201-shortlog.sh | 24 
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index b316cf3..db5b57d 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -307,8 +307,7 @@ static void add_wrapped_shortlog_msg(struct strbuf *sb, 
const char *s,
 const struct shortlog *log)
 {
int col = strbuf_add_wrapped_text(sb, s, log->in1, log->in2, log->wrap);
-   if (col != log->wrap)
-   strbuf_addch(sb, '\n');
+   strbuf_addch(sb, '\n');
 }
 
 void shortlog_output(struct shortlog *log)
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 6872ba1..02ac978 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -120,6 +120,30 @@ test_expect_success 'shortlog from non-git directory' '
test_cmp expect out
 '
 
+test_expect_success 'shortlog should add newline when input line matches 
wraplen' '
+   cat >expect <<\EOF &&
+A U Thor (2):
+  bb:  bbb  bbb bb  bbb b bb
+  aa: aa aa   aa  aa aaa
+
+EOF
+   git shortlog -w >out <<\EOF &&
+commit 0001
+Author: A U Thor 
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+aa: aa aa   aa  aa aaa
+
+commit 0002
+Author: A U Thor 
+Date:   Thu Apr 7 15:14:13 2005 -0700
+
+bb:  bbb  bbb bb  bbb b bb
+
+EOF
+   test_cmp expect out
+'
+
 iconvfromutf8toiso88591() {
printf "%s" "$*" | iconv -f UTF-8 -t ISO8859-1
 }
-- 
1.8.1.rc1.2.gfb98a3a

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol?

2012-10-10 Thread Steffen Prohaska
On Oct 8, 2012, at 6:27 PM, Junio C Hamano wrote:

> Once we go into "want/have" phase, I do not think there is a need
> for fundamental change in the protocol (by this, I am not counting a
> change to send "have"s sparsely and possibly backtracking to bisect
> history, etc. as "fundamental").

I've recently discovered that the current protocol can be amazingly
inefficient when it comes to transferring binary objects.  Assuming two
repositories that are in sync.  After a 'git checkout --orphan && git
commit', a subsequent transfers sends all the blobs attached to the new
commit, although the other side already has all the blobs.

This behavior is especially annoying when (mis)using git to store binary
files.  I was thinking for a while that it might be a reasonable idea to
store binary files in a submodule and frequently cut the history in
order to save space.  The history would have little value anyway, since
diff and merge don't make much sense with binary files.

Eventually, I abandoned the idea due to the current behavior of the
protocol.  I had expected that git would be smarter and behave more like
rsync, for example, by skipping big blobs as soon as it recognizes that
they are already available at both sides.

Maybe the new protocol could include an optimization for the described
case.  I don't know whether this would be a fundamental change.

Steffen

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-web--browse: Fix open HTML help pages from iTerm

2012-09-27 Thread Steffen Prohaska

On Sep 27, 2012, at 9:11 PM, Junio C Hamano wrote:

> Steffen Prohaska  writes:
> 
>> iTerm is an alternative to the default terminal emulation program on Mac
>> OS X.  git-web--browse wasn't aware of iTerm and failed to open HTML
>> help pages when used in a shell session running in iTerm, reporting "No
>> known browser available."  Now it works as expected.
>> 
>> Signed-off-by: Steffen Prohaska 
>> ---
>> git-web--browse.sh | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/git-web--browse.sh b/git-web--browse.sh
>> index 1e82726..95ecf65 100755
>> --- a/git-web--browse.sh
>> +++ b/git-web--browse.sh
>> @@ -120,7 +120,8 @@ if test -z "$browser" ; then
>>  fi
>>  # SECURITYSESSIONID indicates an OS X GUI login session
>>  if test -n "$SECURITYSESSIONID" \
>> --o "$TERM_PROGRAM" = "Apple_Terminal" ; then
>> +-o "$TERM_PROGRAM" = "Apple_Terminal" \
>> +-o "$TERM_PROGRAM" = "iTerm.app" ; then
>>  browser_candidates="open $browser_candidates"
>>  fi
> 
> I do not have anything against iTerm, but could we have a solution
> that does not force us to keep adding 47 different terminal program
> names to the list over the longer term (no pun intended)?  For
> example, "If on OS-X (which by the way does not seem to be checked
> with the current logic) and environment TERM_PROGRAM is set to any
> value", or something.

I googled a bit and it seems that TERM_PROGRAM is specific to OS X.
So simply testing whether TERM_PROGRAM is set to any value (without
additional check for OS X) might be good enough.

I am wondering whether anyone knows if TERM_PROGRAM is used on other
operating systems besides OS X.

Steffen
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] git-web--browse: Fix open HTML help pages from iTerm

2012-09-25 Thread Steffen Prohaska
iTerm is an alternative to the default terminal emulation program on Mac
OS X.  git-web--browse wasn't aware of iTerm and failed to open HTML
help pages when used in a shell session running in iTerm, reporting "No
known browser available."  Now it works as expected.

Signed-off-by: Steffen Prohaska 
---
 git-web--browse.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/git-web--browse.sh b/git-web--browse.sh
index 1e82726..95ecf65 100755
--- a/git-web--browse.sh
+++ b/git-web--browse.sh
@@ -120,7 +120,8 @@ if test -z "$browser" ; then
fi
# SECURITYSESSIONID indicates an OS X GUI login session
if test -n "$SECURITYSESSIONID" \
-   -o "$TERM_PROGRAM" = "Apple_Terminal" ; then
+   -o "$TERM_PROGRAM" = "Apple_Terminal" \
+   -o "$TERM_PROGRAM" = "iTerm.app" ; then
browser_candidates="open $browser_candidates"
fi
# /bin/start indicates MinGW
-- 
1.7.12.1.403.g14e83b4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html