Re: [PATCH/RFC 0/2] Automate updating git-completion.bash a bit

2018-01-17 Thread Duy Nguyen
On Wed, Jan 17, 2018 at 7:51 AM, SZEDER Gábor  wrote:
> On Tue, Jan 16, 2018 at 11:36 AM, Nguyễn Thái Ngọc Duy
>  wrote:
>> I noticed --recurse-submodules was missing from git-grep complete
>> list. Then I found a couple more should be on the list as well and
>> many more in future may face the same faith. Perhaps this helps remedy
>> this situation?
>>
>> This lets us extract certain information from git commands and feed it
>> directly to git-completion.bash. Now long options by default will
>> be complete-able (which also means it's the reviewer's and coder's
>> responsibility to add "no complete" flag appropriately) but I think
>> the number of new dangerous options will be much fewer than
>> completeable ones.
>>
>> This is not really a new idea. Python has argcomplete that does more
>> or less the same thing.
>
> This has come up before for Git as well, see:
>
>   
> https://public-inbox.org/git/1334140165-24958-1-git-send-email-bebar...@gmail.com/T/#u

Thanks! I did search the archive but couldn't find this one.

>
> I see that your approach solves one of the shortcomings of those older
> patches, namely it makes possible to omit dangerous options from
> completion.  Great.
>
> I also see that you want to invoke git in a subshell every time the user
> attempts to complete an --option.  Not so great, at least for Windows
> users.  That older thread contains a few ideas about how to do it only
> once by lazy-initializing a variable for each command to hold its
> options.

Noted.

I see you also pointed out the problem with running commands like
"git-status" without a repository. I'll try to address this too if
possible (I'm thinking about making struct options[] available outside
cmd_*(); then we could handle it more like "git --version" which
always works)
-- 
Duy


Re: [PATCH/RFC 0/2] Automate updating git-completion.bash a bit

2018-01-17 Thread Duy Nguyen
Actually I forgot another option. What if we automate updating the
script at "compile" time instead of calling git at run time? E.g. with
something like below, a contributor could just run

make update-completion

then add git-completion.bash changes to the same patch that introduces
new options. If they forget, Junio could always run this near -rc0.

I know this output is a bit ugly. I probably could try to make the
update work with wrapped __gitcomp lines to be friendlier to human
eyes.

-- 8< --
diff --git a/Makefile b/Makefile
index 1a9b23b679..05eb7c8742 100644
--- a/Makefile
+++ b/Makefile
@@ -2834,3 +2834,5 @@ cover_db: coverage-report
 cover_db_html: cover_db
cover -report html -outputdir cover_db_html cover_db
 
+update-completion:
+   contrib/completion/update.sh contrib/completion/git-completion.bash 
./git
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 3683c772c5..e8c224f486 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1585,21 +1585,7 @@ _git_grep ()
 
case "$cur" in
--*)
-   __gitcomp "
-   --cached
-   --text --ignore-case --word-regexp --invert-match
-   --full-name --line-number
-   --extended-regexp --basic-regexp --fixed-strings
-   --perl-regexp
-   --threads
-   --files-with-matches --name-only
-   --files-without-match
-   --max-depth
-   --count
-   --and --or --not --all-match
-   --break --heading --show-function --function-context
-   --untracked --no-index
-   "
+   __gitcomp_auto grep "--cached --text --ignore-case 
--word-regexp --invert-match --full-name --line-number --extended-regexp 
--basic-regexp --fixed-strings --perl-regexp --threads --files-with-matches 
--name-only --files-without-match --max-depth --count --and --or --not 
--all-match --break --heading --show-function --function-context --untracked 
--no-index"
return
;;
esac
diff --git a/contrib/completion/update.sh b/contrib/completion/update.sh
new file mode 100755
index 00..99c4841152
--- /dev/null
+++ b/contrib/completion/update.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+
+file="$1"
+git="$2"
+
+grep __gitcomp_auto "$file" | while read a cmd b; do
+sed -i "s/\\(.*$a $cmd \).*/\\1$("$git" $cmd --git-completion-helper)/" 
"$file"
+done
-- 8< --




On Wed, Jan 17, 2018 at 04:16:22PM +0700, Duy Nguyen wrote:
> On Wed, Jan 17, 2018 at 7:51 AM, SZEDER Gábor  wrote:
> > On Tue, Jan 16, 2018 at 11:36 AM, Nguyễn Thái Ngọc Duy
> >  wrote:
> >> I noticed --recurse-submodules was missing from git-grep complete
> >> list. Then I found a couple more should be on the list as well and
> >> many more in future may face the same faith. Perhaps this helps remedy
> >> this situation?
> >>
> >> This lets us extract certain information from git commands and feed it
> >> directly to git-completion.bash. Now long options by default will
> >> be complete-able (which also means it's the reviewer's and coder's
> >> responsibility to add "no complete" flag appropriately) but I think
> >> the number of new dangerous options will be much fewer than
> >> completeable ones.
> >>
> >> This is not really a new idea. Python has argcomplete that does more
> >> or less the same thing.
> >
> > This has come up before for Git as well, see:
> >
> >   
> > https://public-inbox.org/git/1334140165-24958-1-git-send-email-bebar...@gmail.com/T/#u
> 
> Thanks! I did search the archive but couldn't find this one.
> 
> >
> > I see that your approach solves one of the shortcomings of those older
> > patches, namely it makes possible to omit dangerous options from
> > completion.  Great.
> >
> > I also see that you want to invoke git in a subshell every time the user
> > attempts to complete an --option.  Not so great, at least for Windows
> > users.  That older thread contains a few ideas about how to do it only
> > once by lazy-initializing a variable for each command to hold its
> > options.
> 
> Noted.
> 
> I see you also pointed out the problem with running commands like
> "git-status" without a repository. I'll try to address this too if
> possible (I'm thinking about making struct options[] available outside
> cmd_*(); then we could handle it more like "git --version" which
> always works)
> -- 
> Duy


Re: [PATCH] Add shell completion for git remote rm

2018-01-17 Thread Duy Nguyen
On Wed, Jan 17, 2018 at 1:17 PM, Kevin Daudt  wrote:
> On Wed, Jan 17, 2018 at 07:44:19AM +0700, Duy Nguyen wrote:
>> PS. This also makes me thing about supporting subcommand aliases, so
>> that people can add back 'git remote rm' if they like (or something
>> like 'git r rm' when they alias 'remote' as well). Which is probably a
>> good thing to do. Long command names are fine when you have completion
>> support, they are a pain to type otherwise.
>>
>
> Couldn't they just create an alias like git rrm then, if they use it so
> often that it becomes an issue?

They could. The only exception that may not work is if they want to
insert some options between "r" and "rm". Sometimes option positions
matter. Anyway this is just thinking out loud, maybe we don't really
need it until people scream about it with a valid use case

> Having another layer of aliases does create a lot of complexity.

Yes. It's partly the reason I wanted this actually ;-) Many commands
have gained subcommands nowadays but there's no shared infrastructure
for managing these subcommands. By adding something that works across
the board at subcommand level I'm forced to "fix" this (or probably
never get to do the sub-aliasing because this "fix" takes forever).
-- 
Duy


Re: GSoC 2018 Org applications. Deadline = January 23, 2018 at 18:00 (CET)

2018-01-17 Thread Christian Couder
On Tue, Jan 16, 2018 at 9:08 PM, Stefan Beller  wrote:
>
> I'll be fine as a co-mentor this year.

Great!

It looks like you also accepted the invite to be an admin, so we are 3
admins now (Matthieu, you and me), maybe 4 if Dscho joins, so our
application is complete.

We can still improve the Idea and Microproject pages though:

https://git.github.io/SoC-2018-Ideas/
https://git.github.io/SoC-2018-Microprojects/


misleading "man git-worktree", is last "add" argument necessarily a "branch"?

2018-01-17 Thread Robert P. J. Day

  perusing "git worktree", and man page reads:

  SYNOPSIS
   git worktree add [-f] [--detach] [--checkout] [--lock]   \
[-b ]  []
 ^^

however, can't that last optional argument be any arbitrary commit,
not just a "branch"?

rday


[ANNOUNCE] Git Rev News edition 35

2018-01-17 Thread Christian Couder
Hi everyone,

The 35th edition of Git Rev News is now published:

  https://git.github.io/rev_news/2018/01/17/edition-35/

Thanks a lot to all the contributors!

Enjoy,
Christian, Jakub, Markus and Gabriel.


Re: misleading "man git-worktree", is last "add" argument necessarily a "branch"?

2018-01-17 Thread Duy Nguyen
On Wed, Jan 17, 2018 at 7:58 PM, Robert P. J. Day  wrote:
>
>   perusing "git worktree", and man page reads:
>
>   SYNOPSIS
>git worktree add [-f] [--detach] [--checkout] [--lock]   \
> [-b ]  []
>  ^^
>
> however, can't that last optional argument be any arbitrary commit,
> not just a "branch"?

It's been changed to commit-ish about two months ago in c4738aedc0
(worktree: add can be created from any commit-ish - 2017-11-26)
-- 
Duy


Re: misleading "man git-worktree", is last "add" argument necessarily a "branch"?

2018-01-17 Thread Robert P. J. Day
On Wed, 17 Jan 2018, Duy Nguyen wrote:

> On Wed, Jan 17, 2018 at 7:58 PM, Robert P. J. Day  
> wrote:
> >
> >   perusing "git worktree", and man page reads:
> >
> >   SYNOPSIS
> >git worktree add [-f] [--detach] [--checkout] [--lock]   \
> > [-b ]  []
> >  ^^
> >
> > however, can't that last optional argument be any arbitrary commit,
> > not just a "branch"?
>
> It's been changed to commit-ish about two months ago in c4738aedc0
> (worktree: add can be created from any commit-ish - 2017-11-26)

  guess i better update my repo, thanks.

rday


RE: [PATCH] RelNotes: fsmonitor: add a pointer to man page and the word itself

2018-01-17 Thread Ben Peart
> -Original Message-
> From: Yasushi SHOJI [mailto:ya...@atmark-techno.com]
> Sent: Wednesday, January 17, 2018 12:09 AM
> To: Ben Peart ; gits...@pobox.com
> Cc: git@vger.kernel.org
> Subject: [PATCH] RelNotes: fsmonitor: add a pointer to man page and the
> word itself
> 
> Add a pointer to git-update-index(1) and a bit more detail about fsmonitor
> and watchman to help people following up the new feature.
> ---
> 
> Hi Ben and Junio,
> 
> Wouldn't it be nice to tell the world a bit more about "file system monitor"
> we now support?  I think that "git status" and "watchman"
> might not ring the bell for some, but adding a word "file system monitor"
> may.
> 

Thanks for the call out!  It would be great if everyone who has larger repos 
and experiencing slower command performance could learn about the new file 
system monitor support.  I'm not sure if editing the release notes would be 
enough to do that or not. 😊  It's important to note that the support for file 
system monitors in git is generic - any file system monitor can be integrated 
with the proper hook/script.  We just provided a Perl script to integrate with 
Watchman as a usable sample as Watchman is already available on several 
platforms.

> I know most of repos don't need fsmonitor but it's a cool feature to have it. 
> ;-
> )
> 
> WDYT?
> 
> 
>  Documentation/RelNotes/2.16.0.txt | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/RelNotes/2.16.0.txt
> b/Documentation/RelNotes/2.16.0.txt
> index 919f3eb3e..0c81c5915 100644
> --- a/Documentation/RelNotes/2.16.0.txt
> +++ b/Documentation/RelNotes/2.16.0.txt
> @@ -62,8 +62,11 @@ UI, Workflows & Features
>   * The SubmittingPatches document has been converted to produce an
> HTML version via AsciiDoc/Asciidoctor.
> 
> - * We learned to talk to watchman to speed up "git status" and other
> -   operations that need to see which paths have been modified.
> + * We learned to optionally talk to a file system monitor via new
> +   fsmonitor extension to speed up "git status" and other operations
> +   that need to see which paths have been modified.  Currently we only
> +   support "watchman".  See File System Monitor section of
> +   git-update-index(1) for more detail.
> 
>   * The "diff" family of commands learned to ignore differences in
> carriage return at the end of line.
> --
> 2.15.1


[PATCH v2 2/2] sha1_file: improve sha1_file_name() perfs

2018-01-17 Thread Christian Couder
As sha1_file_name() could be performance sensitive, let's
try to make it faster by seeding the initial buffer size
to avoid any need to realloc and by using strbuf_addstr()
and strbuf_addc() instead of strbuf_addf().

Helped-by: Derrick Stolee 
Helped-by: Jeff Hostetler 
Signed-off-by: Christian Couder 
---
 sha1_file.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index f66c21b2da..1a94716962 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -323,8 +323,14 @@ static void fill_sha1_path(struct strbuf *buf, const 
unsigned char *sha1)
 
 void sha1_file_name(struct strbuf *buf, const unsigned char *sha1)
 {
-   strbuf_addf(buf, "%s/", get_object_directory());
+   const char *obj_dir = get_object_directory();
+   size_t extra = strlen(obj_dir) + 1 + GIT_MAX_HEXSZ;
 
+   if (extra > strbuf_avail(buf))
+   strbuf_grow(buf, extra);
+
+   strbuf_addstr(buf, obj_dir);
+   strbuf_addch(buf, '/');
fill_sha1_path(buf, sha1);
 }
 
-- 
2.16.0.rc2.3.g254af8b974



Re: [PATCH] sha1_file: remove static strbuf from sha1_file_name()

2018-01-17 Thread Christian Couder
On Tue, Jan 16, 2018 at 8:00 PM, Jeff Hostetler  wrote:
>
>
> On 1/16/2018 9:01 AM, Derrick Stolee wrote:
>>
>> On 1/16/2018 2:18 AM, Christian Couder wrote:
>>>
>>> Using a static buffer in sha1_file_name() is error prone
>>> and the performance improvements it gives are not needed
>>> in most of the callers.
>>>
>>> So let's get rid of this static buffer and, if necessary
>>> or helpful, let's use one in the caller.
>>
>>
>> First: this is a good change for preventing bugs in the future. Do not let
>> my next thought deter you from making this change.
>>
>> Second: I wonder if there is any perf hit now that we are allocating
>> buffers much more often.

When I though that the caller might be performance sensitive, I used a
"static struct strbuf" in the caller to avoid any performance
regression.

Yeah, that means that further work is needed if we want to get rid of
all the static buffers, but that is not my goal. I am just concerned
with cleaning up sha1_file_name() before changing it, and avoiding
some possible trouble when using it.

Feel free to improve on that or even take over this series, otherwise
it can be part of the #leftoverbits.

>> Also, how often does get_object_directory() change,
>> so in some cases we could cache the buffer and only append the parts for the
>> loose object (and not reallocate because the filenames will have equal
>> length).

Again feel free to work on this kind of optimizations on top of this series.

>> I'm concerned about the perf implications when inspecting many loose
>> objects (100k+) but these code paths seem to be involved with more
>> substantial work, such as opening and parsing the objects, so keeping a
>> buffer in-memory is probably unnecessary.

Yeah, I also think it is not necessary to optimize too much.

>>> diff --git a/sha1_file.c b/sha1_file.c
>>> index 3da70ac650..f66c21b2da 100644
>>> --- a/sha1_file.c
>>> +++ b/sha1_file.c
>>> @@ -321,15 +321,11 @@ static void fill_sha1_path(struct strbuf *buf,
>>> const unsigned char *sha1)
>>>   }
>>>   }
>>> -const char *sha1_file_name(const unsigned char *sha1)
>>> +void sha1_file_name(struct strbuf *buf, const unsigned char *sha1)
>>>   {
>>> -static struct strbuf buf = STRBUF_INIT;
>>> -
>>> -strbuf_reset(&buf);
>>> -strbuf_addf(&buf, "%s/", get_object_directory());
>>> +strbuf_addf(buf, "%s/", get_object_directory());
>>> -fill_sha1_path(&buf, sha1);
>>> -return buf.buf;
>>> +fill_sha1_path(buf, sha1);
>>>   }
>>
>>
>> Could you change this to use strbuf_addstr(buf, get_object_directory())
>> followed by a strbuf_addch(buf, '/')? This format string is unnecessary and
>> could become slow if this method is called in a tight loop.
>
> Yes, an _addstr() and _addch() would avoid a sprintf and
> we've seen perf problems with this before.
>
> Could we also add seed the initial buffer size to avoid
> any need to realloc as the buffer is filled in?
>
> Something like:
> size_t len = strlen(get_object_directory()) + GIT_MAX_HEXSZ + 3;
> strbuf_reset(&buf);
> if (len > strbuf_avail(&buf))
> strbuf_grow(&buf, len);
> strbuf_addstr(&buf, ...);

Ok, I did something like that in another patch on top of the first
patch which is just about using a "struct strbuf *" passed as an
argument instead of a static buffer.

>>>   struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
>>> @@ -710,7 +706,12 @@ int check_and_freshen_file(const char *fn, int
>>> freshen)
>>>   static int check_and_freshen_local(const unsigned char *sha1, int
>>> freshen)
>>>   {
>>> -return check_and_freshen_file(sha1_file_name(sha1), freshen);
>>> +static struct strbuf buf = STRBUF_INIT;
>>> +
>>> +strbuf_reset(&buf);
>>> +sha1_file_name(&buf, sha1);
>>> +
>>> +return check_and_freshen_file(buf.buf, freshen);
>>>   }
>
> Does "buf" really need to be static here?  Doesn't this just move the
> problem from sha1_file_name() to here?

Yes, but maybe check_and_freshen_local() is performance sensitive, so
I think it is safer performance wise to still use a static buf.

If there is a consensus that it is ok to not use one here, I am ok to
change that. On the other hand the change could also be part of
another patch on top of this one...

>>>   static int check_and_freshen_nonlocal(const unsigned char *sha1, int
>>> freshen)
>>> @@ -866,8 +867,12 @@ static int stat_sha1_file(const unsigned char *sha1,
>>> struct stat *st,
>>> const char **path)
>>>   {
>>>   struct alternate_object_database *alt;
>>> +static struct strbuf buf = STRBUF_INIT;
>>> +
>>> +strbuf_reset(&buf);
>>> +sha1_file_name(&buf, sha1);
>>> +*path = buf.buf;
>>> -*path = sha1_file_name(sha1);
>>>   if (!lstat(*path, st))
>>>   return 0;
>
> Again, making "buf" static here feels wrong.  Perhaps, change the signature
> of the static function to drop the const on the "path" and strbuf_detach
> buf.buf
> and give it to the caller (if

[PATCH v2 1/2] sha1_file: remove static strbuf from sha1_file_name()

2018-01-17 Thread Christian Couder
Using a static buffer in sha1_file_name() is error prone
and the performance improvements it gives are not needed
in many of the callers.

So let's get rid of this static buffer and, if necessary
or helpful, let's use one in the caller.

Suggested-by: Jeff Hostetler 
Helped-by: Kevin Daudt 
Signed-off-by: Christian Couder 
---
 cache.h   |  8 +++-
 http-walker.c |  6 --
 http.c| 16 ++--
 sha1_file.c   | 38 +-
 4 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index d8b975a571..6db565408e 100644
--- a/cache.h
+++ b/cache.h
@@ -957,12 +957,10 @@ extern void check_repository_format(void);
 #define TYPE_CHANGED0x0040
 
 /*
- * Return the name of the file in the local object database that would
- * be used to store a loose object with the specified sha1.  The
- * return value is a pointer to a statically allocated buffer that is
- * overwritten each time the function is called.
+ * Put in `buf` the name of the file in the local object database that
+ * would be used to store a loose object with the specified sha1.
  */
-extern const char *sha1_file_name(const unsigned char *sha1);
+extern void sha1_file_name(struct strbuf *buf, const unsigned char *sha1);
 
 /*
  * Return an abbreviated sha1 unique within this repository's object database.
diff --git a/http-walker.c b/http-walker.c
index 1ae8363de2..07c2b1af82 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -544,8 +544,10 @@ static int fetch_object(struct walker *walker, unsigned 
char *sha1)
} else if (hashcmp(obj_req->sha1, req->real_sha1)) {
ret = error("File %s has bad hash", hex);
} else if (req->rename < 0) {
-   ret = error("unable to write sha1 filename %s",
-   sha1_file_name(req->sha1));
+   struct strbuf buf = STRBUF_INIT;
+   sha1_file_name(&buf, req->sha1);
+   ret = error("unable to write sha1 filename %s", buf.buf);
+   strbuf_release(&buf);
}
 
release_http_object_request(req);
diff --git a/http.c b/http.c
index 5977712712..5979305bc9 100644
--- a/http.c
+++ b/http.c
@@ -2168,7 +2168,7 @@ struct http_object_request *new_http_object_request(const 
char *base_url,
unsigned char *sha1)
 {
char *hex = sha1_to_hex(sha1);
-   const char *filename;
+   struct strbuf filename = STRBUF_INIT;
char prevfile[PATH_MAX];
int prevlocal;
char prev_buf[PREV_BUF_SIZE];
@@ -2180,14 +2180,15 @@ struct http_object_request 
*new_http_object_request(const char *base_url,
hashcpy(freq->sha1, sha1);
freq->localfile = -1;
 
-   filename = sha1_file_name(sha1);
+   sha1_file_name(&filename, sha1);
snprintf(freq->tmpfile, sizeof(freq->tmpfile),
-"%s.temp", filename);
+"%s.temp", filename.buf);
 
-   snprintf(prevfile, sizeof(prevfile), "%s.prev", filename);
+   snprintf(prevfile, sizeof(prevfile), "%s.prev", filename.buf);
unlink_or_warn(prevfile);
rename(freq->tmpfile, prevfile);
unlink_or_warn(freq->tmpfile);
+   strbuf_release(&filename);
 
if (freq->localfile != -1)
error("fd leakage in start: %d", freq->localfile);
@@ -2302,6 +2303,7 @@ void process_http_object_request(struct 
http_object_request *freq)
 int finish_http_object_request(struct http_object_request *freq)
 {
struct stat st;
+   struct strbuf filename = STRBUF_INIT;
 
close(freq->localfile);
freq->localfile = -1;
@@ -2327,8 +2329,10 @@ int finish_http_object_request(struct 
http_object_request *freq)
unlink_or_warn(freq->tmpfile);
return -1;
}
-   freq->rename =
-   finalize_object_file(freq->tmpfile, sha1_file_name(freq->sha1));
+
+   sha1_file_name(&filename, freq->sha1);
+   freq->rename = finalize_object_file(freq->tmpfile, filename.buf);
+   strbuf_release(&filename);
 
return freq->rename;
 }
diff --git a/sha1_file.c b/sha1_file.c
index 3da70ac650..f66c21b2da 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -321,15 +321,11 @@ static void fill_sha1_path(struct strbuf *buf, const 
unsigned char *sha1)
}
 }
 
-const char *sha1_file_name(const unsigned char *sha1)
+void sha1_file_name(struct strbuf *buf, const unsigned char *sha1)
 {
-   static struct strbuf buf = STRBUF_INIT;
-
-   strbuf_reset(&buf);
-   strbuf_addf(&buf, "%s/", get_object_directory());
+   strbuf_addf(buf, "%s/", get_object_directory());
 
-   fill_sha1_path(&buf, sha1);
-   return buf.buf;
+   fill_sha1_path(buf, sha1);
 }
 
 struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
@@ -710,7 +706,12 @@ int check_and_freshen_file(const char *fn, int freshen)
 
 static int check_and_freshen_local(const unsigned char *sha1, int freshen)
 {
-   return check_and_freshen_file(sha1_file

[PATCH v2 2/2] send-email: Support separate Reply-To address

2018-01-17 Thread Christian Ludwig
In some projects contributions from groups are only accepted from a
common group email address. But every individual may want to recieve
replies to her own personal address. That's what we have 'Reply-To'
headers for in SMTP.

Introduce an optional '--reply-to' command line option. Unfortunately
the $reply_to variable name was already taken for the 'In-Reply-To'
header field. To reduce code churn, use $reply_address as variable
name instead.

Signed-off-by: Christian Ludwig 
---
 Documentation/git-send-email.txt   |  5 +
 contrib/completion/git-completion.bash |  2 +-
 git-send-email.perl| 18 +-
 t/t9001-send-email.sh  |  2 ++
 4 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index 8060ea35c..71ef97ba9 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -84,6 +84,11 @@ See the CONFIGURATION section for `sendemail.multiEdit`.
the value of GIT_AUTHOR_IDENT, or GIT_COMMITTER_IDENT if that is not
set, as returned by "git var -l".
 
+--reply-to=::
+   Specify the address where replies from recipients should go to.
+   Use this if replies to messages should go to another address than what
+   is specified with the --from parameter.
+
 --in-reply-to=::
Make the first mail (or all the mails with `--no-thread`) appear as a
reply to the given Message-Id, which avoids breaking threads to
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 3683c772c..2a0dc4eef 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -2081,7 +2081,7 @@ _git_send_email ()
--compose --confirm= --dry-run --envelope-sender
--from --identity
--in-reply-to --no-chain-reply-to --no-signed-off-by-cc
-   --no-suppress-from --no-thread --quiet
+   --no-suppress-from --no-thread --quiet --reply-to
--signed-off-by-cc --smtp-pass --smtp-server
--smtp-server-port --smtp-encryption= --smtp-user
--subject --suppress-cc= --suppress-from --thread --to
diff --git a/git-send-email.perl b/git-send-email.perl
index 0c07f48d5..9bf758307 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -56,6 +56,7 @@ git send-email --dump-aliases
 --[no-]cc * Email Cc:
 --[no-]bcc* Email Bcc:
 --subject * Email "Subject:"
+--reply-to* Email "Reply-To:"
 --in-reply-to * Email "In-Reply-To:"
 --[no-]xmailer * Add "X-Mailer:" header (default).
 --[no-]annotate* Review each patch that will be sent in an 
editor.
@@ -166,7 +167,7 @@ my $re_encoded_word = 
qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
 
 # Variables we fill in automatically, or via prompting:
 my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
-   $initial_in_reply_to,$initial_subject,@files,
+   $initial_in_reply_to,$reply_to,$initial_subject,@files,
$author,$sender,$smtp_authpass,$annotate,$use_xmailer,$compose,$time);
 
 my $envelope_sender;
@@ -315,6 +316,7 @@ die __("--dump-aliases incompatible with other options\n")
 $rc = GetOptions(
"sender|from=s" => \$sender,
 "in-reply-to=s" => \$initial_in_reply_to,
+   "reply-to=s" => \$reply_to,
"subject=s" => \$initial_subject,
"to=s" => \@initial_to,
"to-cmd=s" => \$to_cmd,
@@ -677,6 +679,7 @@ if ($compose) {
my $tpl_sender = $sender || $repoauthor || $repocommitter || '';
my $tpl_subject = $initial_subject || '';
my $tpl_in_reply_to = $initial_in_reply_to || '';
+   my $tpl_reply_to = $reply_to || '';
 
print $c <
 References: 
+Reply-To: Reply 
 
 Result: OK
 EOF
@@ -297,6 +298,7 @@ test_expect_success $PREREQ 'Show all headers' '
--dry-run \
--suppress-cc=sob \
--from="Example " \
+   --reply-to="Reply " \
--to=t...@example.com \
--cc=c...@example.com \
--bcc=b...@example.com \
-- 
2.15.1



[PATCH v2 1/2] send-email: Rename variable for clarity

2018-01-17 Thread Christian Ludwig
The SMTP protocol has both, the 'Reply-To' and the 'In-Reply-To' header
fields. We only use the latter. To avoid confusion, rename the variable
for it.

Signed-off-by: Christian Ludwig 
---
 git-send-email.perl | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index edcc6d346..0c07f48d5 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -166,13 +166,13 @@ my $re_encoded_word = 
qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
 
 # Variables we fill in automatically, or via prompting:
 my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
-   $initial_reply_to,$initial_subject,@files,
+   $initial_in_reply_to,$initial_subject,@files,
$author,$sender,$smtp_authpass,$annotate,$use_xmailer,$compose,$time);
 
 my $envelope_sender;
 
 # Example reply to:
-#$initial_reply_to = ''; #<20050203173208.ga23...@foobar.com>';
+#$initial_in_reply_to = ''; #<20050203173208.ga23...@foobar.com>';
 
 my $repo = eval { Git->repository() };
 my @repo = $repo ? ($repo) : ();
@@ -314,7 +314,7 @@ die __("--dump-aliases incompatible with other options\n")
 if !$help and $dump_aliases and @ARGV;
 $rc = GetOptions(
"sender|from=s" => \$sender,
-"in-reply-to=s" => \$initial_reply_to,
+"in-reply-to=s" => \$initial_in_reply_to,
"subject=s" => \$initial_subject,
"to=s" => \@initial_to,
"to-cmd=s" => \$to_cmd,
@@ -676,7 +676,7 @@ if ($compose) {
 
my $tpl_sender = $sender || $repoauthor || $repocommitter || '';
my $tpl_subject = $initial_subject || '';
-   my $tpl_reply_to = $initial_reply_to || '';
+   my $tpl_in_reply_to = $initial_in_reply_to || '';
 
print $c < "",
valid_re => qr/\@.*\./, confirm_only => 1);
 }
-if (defined $initial_reply_to) {
-   $initial_reply_to =~ s/^\s*?\s*$//;
-   $initial_reply_to = "<$initial_reply_to>" if $initial_reply_to ne '';
+if (defined $initial_in_reply_to) {
+   $initial_in_reply_to =~ s/^\s*?\s*$//;
+   $initial_in_reply_to = "<$initial_in_reply_to>" if $initial_in_reply_to 
ne '';
 }
 
 if (!defined $smtp_server) {
@@ -901,7 +901,7 @@ if ($compose && $compose > 0) {
 }
 
 # Variables we set as part of the loop over files
-our ($message_id, %mail, $subject, $reply_to, $references, $message,
+our ($message_id, %mail, $subject, $in_reply_to, $references, $message,
$needs_confirm, $message_num, $ask_default);
 
 sub extract_valid_address {
@@ -1310,9 +1310,9 @@ Message-Id: $message_id
if ($use_xmailer) {
$header .= "X-Mailer: git-send-email $gitversion\n";
}
-   if ($reply_to) {
+   if ($in_reply_to) {
 
-   $header .= "In-Reply-To: $reply_to\n";
+   $header .= "In-Reply-To: $in_reply_to\n";
$header .= "References: $references\n";
}
if (@xh) {
@@ -1489,8 +1489,8 @@ EOF
return 1;
 }
 
-$reply_to = $initial_reply_to;
-$references = $initial_reply_to || '';
+$in_reply_to = $initial_in_reply_to;
+$references = $initial_in_reply_to || '';
 $subject = $initial_subject;
 $message_num = 0;
 
@@ -1700,9 +1700,9 @@ foreach my $t (@files) {
 
# set up for the next message
if ($thread && $message_was_sent &&
-   ($chain_reply_to || !defined $reply_to || length($reply_to) == 
0 ||
+   ($chain_reply_to || !defined $in_reply_to || 
length($in_reply_to) == 0 ||
$message_num == 1)) {
-   $reply_to = $message_id;
+   $in_reply_to = $message_id;
if (length $references > 0) {
$references .= "\n $message_id";
} else {
-- 
2.15.1



Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-17 Thread Jonathan Nieder
Hi,

Duy Nguyen wrote:
> On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams  wrote:

>>  IIUC Split index is an index extension
>> that can be enabled to limit the size of the index file that is written
>> when making changes to the index.  It breaks the index into two pieces,
>> index (which contains only changes) and sharedindex.X (which
>> contains unchanged information) where 'X' is a value found in the
>> index file.  If we don't do anything fancy then these two files live
>> next to one another in a repository's git directory at $GIT_DIR/index
>> and $GIT_DIR/sharedindex.X.  This seems to work all well and fine
>> except that this isn't always the case and the read_index_from function
>> takes this into account by enabling a caller to specify a path to where
>> the index file is located.  We can do this by specifying the index file
>> we want to use by setting GIT_INDEX_FILE.
[...]
>> In this case if i were to specify a location of an
>> index file in my home directory '~/index' and be using the split index
>> feature then the corresponding sharedindex file would live in my
>> repository's git directory '~/project/.git/sharedindex.X'.  So the
>> sharedindex file is always located relative to the project's git
>> directory and not the index file itself, which is kind of confusing.
>> Maybe a better design would be to have the sharedindex file located
>> relative to the index file.
>
> That adds more problems. Now when you move the index file around you
> have to move the shared index file too (think about atomic rename
> which we use in plenty of places, we can't achieve that by moving two
> files). A new dependency to $GIT_DIR is not that confusing to me, the
> index file is useless anyway if you don't have access to
> $GIT_DIR/objects. There was always the option to _not_ split the index
> when $GIT_INDEX_FILE is specified, I think I did consider that but I
> dropped it because we'd lose the performance gain by splitting.

Can you elaborate a little more on this?

At first glance, it seems simpler to say "paths in index extensions
named in the index file are relative to the location of the index
file" and to make moving the index file also require moving the shared
index file, exactly as you say.  So at least from a "principle of
least surprise" perspective I would be tempted to go that way.

It's true that we rely on atomic rename in plenty of places, but only
within a directory.  (Filesystem boundaries, NFS, etc mean that atomic
renames across directories are a lost cause.)

Fortunately index files (including temp index files used by scripts)
tend to only be in $GIT_DIR, for exactly that reason.  So I am
wondering if switching to index-file-relative semantics would be an
invasive move and what the pros and cons of such a move are.

Thanks,
Jonathan


Re: [PATCH] Add shell completion for git remote rm

2018-01-17 Thread Junio C Hamano
Duy Nguyen  writes:

> Looks good. If we want to be more careful, we can follow up with
> something even more annoying like this before removing 'rm'
>
> -- 8< --
> diff --git a/Documentation/git-remote.txt b/Documentation/git-remote.txt
> index 577b969c1b..0a544703e6 100644
> --- a/Documentation/git-remote.txt
> +++ b/Documentation/git-remote.txt
> @@ -90,7 +90,6 @@ In case  and  are the same, and  is a file 
> under
>  the configuration file format.
>  
>  'remove'::
> -'rm'::
>  
>  Remove the remote named . All remote-tracking branches and
>  configuration settings for the remote are removed.
> diff --git a/builtin/remote.c b/builtin/remote.c
> index d95bf904c3..774ef6931e 100644
> --- a/builtin/remote.c
> +++ b/builtin/remote.c
> @@ -1609,7 +1609,10 @@ int cmd_remote(int argc, const char **argv, const char 
> *prefix)
>   result = add(argc, argv);
>   else if (!strcmp(argv[0], "rename"))
>   result = mv(argc, argv);
> - else if (!strcmp(argv[0], "rm") || !strcmp(argv[0], "remove"))
> + else if (!strcmp(argv[0], "rm")) {
> + warning(_("'rm' is a deprecated synonym. Use 'remove' 
> instead."));
> + result = rm(argc, argv);
> + } else if (!strcmp(argv[0], "remove"))
>   result = rm(argc, argv);
>   else if (!strcmp(argv[0], "set-head"))
>   result = set_head(argc, argv);
> -- 8< --

Yes, this is a sensible way to properly deprecate it further before
removal.


Re: [PATCH] RelNotes: fsmonitor: add a pointer to man page and the word itself

2018-01-17 Thread Junio C Hamano
Ben Peart  writes:

> It's important to note that the support for file system monitors
> in git is generic - any file system monitor can be integrated with
> the proper hook/script.  We just provided a Perl script to
> integrate with Watchman as a usable sample as Watchman is already
> available on several platforms.

Mentioning only "FS monitor" is insufficient (it would merely
frustrate those with FS monitor that is not watchman, after they
waste more time to find out that we only do watchman right now).
Mentioning only "Watchman" *is* sufficient, though.

Mentioning both would be ideal, i.e. e.g. "We now have a generic
mechanism to talk to FS monitors to optimize checks of working tree
files, and this version of Git can talk to Watchman using the
mechanism".  And that is what Yashi's suggestion did, if I am not
mistaken.




Re* misleading "man git-worktree", is last "add" argument necessarily a "branch"?

2018-01-17 Thread Junio C Hamano
Duy Nguyen  writes:

> On Wed, Jan 17, 2018 at 7:58 PM, Robert P. J. Day  
> wrote:
>>
>>   perusing "git worktree", and man page reads:
>>
>>   SYNOPSIS
>>git worktree add [-f] [--detach] [--checkout] [--lock]   \
>> [-b ]  []
>>  ^^
>>
>> however, can't that last optional argument be any arbitrary commit,
>> not just a "branch"?
>
> It's been changed to commit-ish about two months ago in c4738aedc0
> (worktree: add can be created from any commit-ish - 2017-11-26)

Indeed "git worktree --help" is more up to date, but "git worktree
-h" is stale.

-- >8 --
Subject: worktree: say that "add" takes an arbitrary commit in short-help

c4738aed ("worktree: add can be created from any commit-ish",
2017-11-26) taught "git worktree add" to start a new worktree
with an arbitrary commit-ish checked out, not limited to a tip
of a branch.  

"git worktree --help" was updated to describe this, but we forgot to
update "git worktree -h".

Signed-off-by: Junio C Hamano 
---
 builtin/worktree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 7cef5b120b..9efdc22466 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -14,7 +14,7 @@
 #include "worktree.h"
 
 static const char * const worktree_usage[] = {
-   N_("git worktree add []  []"),
+   N_("git worktree add []  []"),
N_("git worktree list []"),
N_("git worktree lock [] "),
N_("git worktree prune []"),




[PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
fsync is required for data integrity as there is no gurantee that
data makes it to disk at any specified time without it.  Even for
ext3 with data=ordered mode the file system will only commit all
data at some point in time that is not guaranteed.

I've lost data on development machines with various times countless
times due to the lack of this option, and now lost trees on a
git server with ext4 as well yesterday.  It's time to make git
safe by default.

Signed-off-by: Christoph Hellwig 
---
 Documentation/config.txt | 6 ++
 environment.c| 2 +-
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92..9a1cec5c8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -866,10 +866,8 @@ core.whitespace::
 core.fsyncObjectFiles::
This boolean will enable 'fsync()' when writing object files.
 +
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+This option is enabled by default and ensures actual data integrity
+by calling fsync after writing object files.
 
 core.preloadIndex::
Enable parallel index preload for operations like 'git diff'
diff --git a/environment.c b/environment.c
index 63ac38a46..c74375b5e 100644
--- a/environment.c
+++ b/environment.c
@@ -36,7 +36,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+int fsync_object_files = 1;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
-- 
2.14.2



Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Junio C Hamano
Christoph Hellwig  writes:

> fsync is required for data integrity as there is no gurantee that
> data makes it to disk at any specified time without it.  Even for
> ext3 with data=ordered mode the file system will only commit all
> data at some point in time that is not guaranteed.

It comes from this one:

commit aafe9fbaf4f1d1f27a6f6e3eb3e246fff81240ef
Author: Linus Torvalds 
Date:   Wed Jun 18 15:18:44 2008 -0700

Add config option to enable 'fsync()' of object files

As explained in the documentation[*] this is totally useless on
filesystems that do ordered/journalled data writes, but it can be a
useful safety feature on filesystems like HFS+ that only journal the
metadata, not the actual file contents.

It defaults to off, although we could presumably in theory some day
auto-enable it on a per-filesystem basis.

[*] Yes, I updated the docs for the thing.  Hell really _has_ frozen
over, and the four horsemen are probably just beyond the horizon.
EVERYBODY PANIC!

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0e25b2c92..9a1cec5c8 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -866,10 +866,8 @@ core.whitespace::
>  core.fsyncObjectFiles::
>   This boolean will enable 'fsync()' when writing object files.
>  +
> -This is a total waste of time and effort on a filesystem that orders
> -data writes properly, but can be useful for filesystems that do not use
> -journalling (traditional UNIX filesystems) or that only journal metadata
> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> +This option is enabled by default and ensures actual data integrity
> +by calling fsync after writing object files.

I am somewhat sympathetic to the desire to flip the default to
"safe" and allow those who know they are already safe to tweak the
knob for performance, and it also makes sense to document that the
default is "true" here.  But I do not see the point of removing the
four lines from this paragraph; the sole effect of the removal is to
rob information from readers that they can use to decide if they
want to disable the configuration, no?


Re: [PATCH v2 1/2] send-email: Rename variable for clarity

2018-01-17 Thread Junio C Hamano
Christian Ludwig  writes:

> The SMTP protocol has both, the 'Reply-To' and the 'In-Reply-To' header
> fields. We only use the latter. To avoid confusion, rename the variable
> for it.
>
> Signed-off-by: Christian Ludwig 
> ---

Makes sense.

>  git-send-email.perl | 38 +++---
>  1 file changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index edcc6d346..0c07f48d5 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -166,13 +166,13 @@ my $re_encoded_word = 
> qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
>  
>  # Variables we fill in automatically, or via prompting:
>  my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
> - $initial_reply_to,$initial_subject,@files,
> + $initial_in_reply_to,$initial_subject,@files,
>   $author,$sender,$smtp_authpass,$annotate,$use_xmailer,$compose,$time);
>  
>  my $envelope_sender;
>  
>  # Example reply to:
> -#$initial_reply_to = ''; #<20050203173208.ga23...@foobar.com>';
> +#$initial_in_reply_to = ''; #<20050203173208.ga23...@foobar.com>';
>  
>  my $repo = eval { Git->repository() };
>  my @repo = $repo ? ($repo) : ();
> @@ -314,7 +314,7 @@ die __("--dump-aliases incompatible with other options\n")
>  if !$help and $dump_aliases and @ARGV;
>  $rc = GetOptions(
>   "sender|from=s" => \$sender,
> -"in-reply-to=s" => \$initial_reply_to,
> +"in-reply-to=s" => \$initial_in_reply_to,
>   "subject=s" => \$initial_subject,
>   "to=s" => \@initial_to,
>   "to-cmd=s" => \$to_cmd,
> @@ -676,7 +676,7 @@ if ($compose) {
>  
>   my $tpl_sender = $sender || $repoauthor || $repocommitter || '';
>   my $tpl_subject = $initial_subject || '';
> - my $tpl_reply_to = $initial_reply_to || '';
> + my $tpl_in_reply_to = $initial_in_reply_to || '';
>  
>   print $c <  From $tpl_sender # This line is ignored.
> @@ -689,7 +689,7 @@ Clear the body content if you don't wish to send a 
> summary.
>  EOT2
>  From: $tpl_sender
>  Subject: $tpl_subject
> -In-Reply-To: $tpl_reply_to
> +In-Reply-To: $tpl_in_reply_to
>  
>  EOT3
>   for my $f (@files) {
> @@ -736,7 +736,7 @@ EOT3
>   quote_subject($subject, $compose_encoding) .
>   "\n";
>   } elsif (/^In-Reply-To:\s*(.+)\s*$/i) {
> - $initial_reply_to = $1;
> + $initial_in_reply_to = $1;
>   next;
>   } elsif (/^From:\s*(.+)\s*$/i) {
>   $sender = $1;
> @@ -872,16 +872,16 @@ sub expand_one_alias {
>  @initial_cc = process_address_list(@initial_cc);
>  @bcclist = process_address_list(@bcclist);
>  
> -if ($thread && !defined $initial_reply_to && $prompting) {
> - $initial_reply_to = ask(
> +if ($thread && !defined $initial_in_reply_to && $prompting) {
> + $initial_in_reply_to = ask(
>   __("Message-ID to be used as In-Reply-To for the first email 
> (if any)? "),
>   default => "",
>   valid_re => qr/\@.*\./, confirm_only => 1);
>  }
> -if (defined $initial_reply_to) {
> - $initial_reply_to =~ s/^\s* - $initial_reply_to =~ s/>?\s*$//;
> - $initial_reply_to = "<$initial_reply_to>" if $initial_reply_to ne '';
> +if (defined $initial_in_reply_to) {
> + $initial_in_reply_to =~ s/^\s* + $initial_in_reply_to =~ s/>?\s*$//;
> + $initial_in_reply_to = "<$initial_in_reply_to>" if $initial_in_reply_to 
> ne '';
>  }
>  
>  if (!defined $smtp_server) {
> @@ -901,7 +901,7 @@ if ($compose && $compose > 0) {
>  }
>  
>  # Variables we set as part of the loop over files
> -our ($message_id, %mail, $subject, $reply_to, $references, $message,
> +our ($message_id, %mail, $subject, $in_reply_to, $references, $message,
>   $needs_confirm, $message_num, $ask_default);
>  
>  sub extract_valid_address {
> @@ -1310,9 +1310,9 @@ Message-Id: $message_id
>   if ($use_xmailer) {
>   $header .= "X-Mailer: git-send-email $gitversion\n";
>   }
> - if ($reply_to) {
> + if ($in_reply_to) {
>  
> - $header .= "In-Reply-To: $reply_to\n";
> + $header .= "In-Reply-To: $in_reply_to\n";
>   $header .= "References: $references\n";
>   }
>   if (@xh) {
> @@ -1489,8 +1489,8 @@ EOF
>   return 1;
>  }
>  
> -$reply_to = $initial_reply_to;
> -$references = $initial_reply_to || '';
> +$in_reply_to = $initial_in_reply_to;
> +$references = $initial_in_reply_to || '';
>  $subject = $initial_subject;
>  $message_num = 0;
>  
> @@ -1700,9 +1700,9 @@ foreach my $t (@files) {
>  
>   # set up for the next message
>   if ($thread && $message_was_sent &&
> - ($chain_reply_to || !defined $reply_to || length($reply_to) == 
> 0 ||
> + ($chain_reply_to || !defined $in_reply_to || 
> length($in_reply_to) == 0 

[PATCH] packfile: use get_be64() for large offsets

2018-01-17 Thread Derrick Stolee
The pack-index version 2 format uses two 4-byte integers in network-byte order 
to represent one 8-byte value. The current implementation has several code 
clones for stitching these integers together.

Use get_be64() to create an 8-byte integer from two 4-byte integers represented 
this way.

Signed-off-by: Derrick Stolee 
---
 pack-revindex.c | 6 ++
 packfile.c  | 3 +--
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/pack-revindex.c b/pack-revindex.c
index 1b7ebd8d7e..ff5f62c033 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -134,10 +134,8 @@ static void create_pack_revindex(struct packed_git *p)
if (!(off & 0x8000)) {
p->revindex[i].offset = off;
} else {
-   p->revindex[i].offset =
-   ((uint64_t)ntohl(*off_64++)) << 32;
-   p->revindex[i].offset |=
-   ntohl(*off_64++);
+   p->revindex[i].offset = get_be64(off_64);
+   off_64 += 2;
}
p->revindex[i].nr = i;
}
diff --git a/packfile.c b/packfile.c
index 4a5fe7ab18..228ed0d59a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1702,8 +1702,7 @@ off_t nth_packed_object_offset(const struct packed_git 
*p, uint32_t n)
return off;
index += p->num_objects * 4 + (off & 0x7fff) * 8;
check_pack_index_ptr(p, index);
-   return (((uint64_t)ntohl(*((uint32_t *)(index + 0 << 32) |
-  ntohl(*((uint32_t *)(index + 4)));
+   return get_be64(index);
}
 }
 
-- 
2.15.0



Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote:
> I am somewhat sympathetic to the desire to flip the default to
> "safe" and allow those who know they are already safe to tweak the
> knob for performance, and it also makes sense to document that the
> default is "true" here.  But I do not see the point of removing the
> four lines from this paragraph; the sole effect of the removal is to
> rob information from readers that they can use to decide if they
> want to disable the configuration, no?

Does this sound any better?  It is a little to technical for my
taste, but I couldn't come up with anything better:

---
>From ab8f2d38dfe40e74de5399af0d069427c7473b76 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Wed, 17 Jan 2018 19:42:46 +0100
Subject: enable core.fsyncObjectFiles by default

fsync is required for data integrity as there is no gurantee that
data makes it to disk at any specified time without it.  Even for
ext3 with data=ordered mode the file system will only commit all
data at some point in time that is not guaranteed.

I've lost data on development machines with various times countless
times due to the lack of this option, and now lost trees on a
git server with ext4 as well yesterday.  It's time to make git
safe by default.

Signed-off-by: Christoph Hellwig 
---
 Documentation/config.txt | 11 +++
 environment.c|  2 +-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92..8b99f1389 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -866,10 +866,13 @@ core.whitespace::
 core.fsyncObjectFiles::
This boolean will enable 'fsync()' when writing object files.
 +
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+This option is enabled by default and ensures actual data integrity
+by calling fsync after writing object files.
+
+Note that this might be really slow on ext3 in the traditional
+data=ordered mode, in which case you might want to disable this option.
+ext3 in data=ordered mode will order the actual data writeout with
+metadata operation, but not actually guarantee data integrity either.
 
 core.preloadIndex::
Enable parallel index preload for operations like 'git diff'
diff --git a/environment.c b/environment.c
index 63ac38a46..c74375b5e 100644
--- a/environment.c
+++ b/environment.c
@@ -36,7 +36,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+int fsync_object_files = 1;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
-- 
2.14.2





Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Matthew Wilcox
On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote:
> Christoph Hellwig  writes:
> > @@ -866,10 +866,8 @@ core.whitespace::
> >  core.fsyncObjectFiles::
> > This boolean will enable 'fsync()' when writing object files.
> >  +
> > -This is a total waste of time and effort on a filesystem that orders
> > -data writes properly, but can be useful for filesystems that do not use
> > -journalling (traditional UNIX filesystems) or that only journal metadata
> > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> > +This option is enabled by default and ensures actual data integrity
> > +by calling fsync after writing object files.
> 
> I am somewhat sympathetic to the desire to flip the default to
> "safe" and allow those who know they are already safe to tweak the
> knob for performance, and it also makes sense to document that the
> default is "true" here.  But I do not see the point of removing the
> four lines from this paragraph; the sole effect of the removal is to
> rob information from readers that they can use to decide if they
> want to disable the configuration, no?

How about this instead?

This option is enabled by default and ensures data integrity by calling
fsync after writing object files.  It is not necessary on filesystems
which journal data writes, but is still necessary on filesystems which
do not use journalling (ext2), or that only journal metadata writes
(OS X's HFS+, or Linux's ext4 with "data=writeback").  Turning this
option off will increase performance at the possible risk of data loss.



Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
On Wed, Jan 17, 2018 at 11:37:31AM -0800, Matthew Wilcox wrote:
> How about this instead?
> 
> This option is enabled by default and ensures data integrity by calling
> fsync after writing object files.  It is not necessary on filesystems
> which journal data writes, but is still necessary on filesystems which
> do not use journalling (ext2), or that only journal metadata writes
> (OS X's HFS+, or Linux's ext4 with "data=writeback").  Turning this
> option off will increase performance at the possible risk of data loss.

I think this goes entirely into the wrong direction.  The point is
fsync is always the right thing to do.  But on ext3 (and ext3 only)
the right thing is way too painful, and conveniently ext3 happens
to be almost ok without it.  So if anything should get a special
mention it is ext3.


Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Andreas Schwab
On Jan 17 2018, Christoph Hellwig  wrote:

> I've lost data on development machines with various times countless
> times due to the lack of this option, and now lost trees on a

Too many times. :-)

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files"

2018-01-17 Thread Johannes Schindelin
Hi Michael,

On Mon, 15 Jan 2018, Michael Haggerty wrote:

> Thanks for your patch. I haven't measured the performance difference
> of `mmap()` vs. `read()` for small `packed-refs` files, but it's not
> surprising that `read()` would be faster.
> 
> I especially like the fix for zero-length `packed-refs` files. (Even
> though AFAIK Git never writes such files, they are totally legitimate
> and shouldn't cause Git to fail.) With or without the additions
> mentioned below,
> 
> Reviewed-by: Michael Haggerty 
> 
> While reviewing your patch, I realized that some areas of the existing
> code use constructs that are undefined according to the C standard,
> such as computing `NULL + 0` and `NULL - NULL`. This was already wrong
> (and would come up more frequently after your change). Even though
> these are unlikely to be problems in the real world, it would be good
> to avoid them.
> 
> So I will follow up this email with three patches:
> 
> 1. Mention that `snapshot::buf` can be NULL for empty files
> 
>I suggest squashing this into your patch, to make it clear that
>`snapshot::buf` and `snapshot::eof` can also be NULL if the
>`packed-refs` file is empty.
> 
> 2. create_snapshot(): exit early if the file was empty
> 
>Avoid undefined behavior by returning early if `snapshot->buf` is
>NULL.
> 
> 3. find_reference_location(): don't invoke if `snapshot->buf` is NULL
> 
>Avoid undefined behavior and confusing semantics by not calling
>`find_reference_location()` when `snapshot->buf` is NULL.
> 
> Michael
> 
> Michael Haggerty (3):
>   SQUASH? Mention that `snapshot::buf` can be NULL for empty files
>   create_snapshot(): exit early if the file was empty
>   find_reference_location(): don't invoke if `snapshot->buf` is NULL

I reviewed those patches and find the straight-forward (and obviously
good).

Thanks,
Dscho


Re: [PATCH v2 2/2] sha1_file: improve sha1_file_name() perfs

2018-01-17 Thread Jeff Hostetler



On 1/17/2018 12:54 PM, Christian Couder wrote:

As sha1_file_name() could be performance sensitive, let's
try to make it faster by seeding the initial buffer size
to avoid any need to realloc and by using strbuf_addstr()
and strbuf_addc() instead of strbuf_addf().

Helped-by: Derrick Stolee 
Helped-by: Jeff Hostetler 
Signed-off-by: Christian Couder 
---
  sha1_file.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index f66c21b2da..1a94716962 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -323,8 +323,14 @@ static void fill_sha1_path(struct strbuf *buf, const 
unsigned char *sha1)
  
  void sha1_file_name(struct strbuf *buf, const unsigned char *sha1)

  {
-   strbuf_addf(buf, "%s/", get_object_directory());
+   const char *obj_dir = get_object_directory();
+   size_t extra = strlen(obj_dir) + 1 + GIT_MAX_HEXSZ;


Very minor nit.  Should this be "+3" rather than "+1"?
One for the slash after obj_dir, one for the slash between
"xx/y[38]", and one for the trailing NUL.

  
+	if (extra > strbuf_avail(buf))

+   strbuf_grow(buf, extra);
+
+   strbuf_addstr(buf, obj_dir);
+   strbuf_addch(buf, '/');
fill_sha1_path(buf, sha1);
  }
  



Re: [PATCH v2 2/2] sha1_file: improve sha1_file_name() perfs

2018-01-17 Thread Junio C Hamano
Jeff Hostetler  writes:

>> void sha1_file_name(struct strbuf *buf, const unsigned char
>> *sha1)
>>   {
>> -strbuf_addf(buf, "%s/", get_object_directory());
>> +const char *obj_dir = get_object_directory();
>> +size_t extra = strlen(obj_dir) + 1 + GIT_MAX_HEXSZ;
>
> Very minor nit.  Should this be "+3" rather than "+1"?
> One for the slash after obj_dir, one for the slash between
> "xx/y[38]", and one for the trailing NUL.
>
>>   +  if (extra > strbuf_avail(buf))
>> +strbuf_grow(buf, extra);

The callers who care use static strbuf with 1/2, which lets them
grow it to an appropriate size after they make their first call.

On the other hand, the ones to which performance does not matter by
definition do not care.

I actually think this whole "extra -> grow" business should be
discarded.  With a miscomputed "extra" like this, it does not help
anybody---everybody may pay cost for one extra realloc due to the
miscalculation, and the ones that do care also do during their first
call.




Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Jeff King
On Wed, Jan 17, 2018 at 07:48:28PM +0100, Christoph Hellwig wrote:

> fsync is required for data integrity as there is no gurantee that
> data makes it to disk at any specified time without it.  Even for
> ext3 with data=ordered mode the file system will only commit all
> data at some point in time that is not guaranteed.
> 
> I've lost data on development machines with various times countless
> times due to the lack of this option, and now lost trees on a
> git server with ext4 as well yesterday.  It's time to make git
> safe by default.

I'm definitely sympathetic, and I've contemplated a patch like this a
few times. But I'm not sure we're "safe by default" here after this
patch. In particular:

  1. This covers only loose objects. We generally sync pack writes
 already, so we're covered there. But we do not sync ref updates at
 all, which we'd probably want to in a default-safe setup (a common
 post-crash symptom I've seen is zero-length ref files).

  2. Is it sufficient to fsync() the individual file's descriptors?
 We often do other filesystem operations (like hardlinking or
 renaming) that also need to be committed to disk before an
 operation can be considered saved.

  3. Related to (2), we often care about the order of metadata commits.
 E.g., a common sequence is:

   a. Write object contents to tempfile.

   b. rename() or hardlink tempfile to final name.

   c. Write object name into ref.lock file.

   d. rename() ref.lock to ref

 If we see (d) but not (b), then the result is a corrupted
 repository. Is this guaranteed by ext4 journaling with
 data=ordered?

It may be that data=ordered gets us what we need for (2) and (3). But I
think at the very least we should consider fsyncing ref updates based on
a config option, too.

-Peff


Re: [PATCH v2 2/2] send-email: Support separate Reply-To address

2018-01-17 Thread Junio C Hamano
Christian Ludwig  writes:

> In some projects contributions from groups are only accepted from a
> common group email address. But every individual may want to recieve
> replies to her own personal address. That's what we have 'Reply-To'
> headers for in SMTP.
>
> Introduce an optional '--reply-to' command line option. Unfortunately
> the $reply_to variable name was already taken for the 'In-Reply-To'
> header field. To reduce code churn, use $reply_address as variable
> name instead.
>
> Signed-off-by: Christian Ludwig 
> ---
>  Documentation/git-send-email.txt   |  5 +
>  contrib/completion/git-completion.bash |  2 +-
>  git-send-email.perl| 18 +-
>  t/t9001-send-email.sh  |  2 ++
>  4 files changed, 25 insertions(+), 2 deletions(-)

Thanks.

While merging this with other topics in flight on 'pu', there were
minor merge conflicts, especially with np/send-email-header-parsing
that ends at b6049542 ("send-email: extract email-parsing code into
a subroutine", 2017-12-15) that attempts to refactor the code that
reads the header lines.  As there is *no* real change that benefits
by the refactoring accompanying the topic, it was a bit hard for me
to say if it is needless code churn or if it is a good refactoring.

I wonder if this change can be a good demonstration to measure the
goodness of it.  IOW, how would these two patches look if rebased on
the result of merging b6049542 to today's 'master'?  Would it make
these two patches cleaner and easier to grok?




Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Christoph Hellwig
On Wed, Jan 17, 2018 at 03:55:09PM -0500, Jeff King wrote:
> I'm definitely sympathetic, and I've contemplated a patch like this a
> few times. But I'm not sure we're "safe by default" here after this
> patch. In particular:
> 
>   1. This covers only loose objects. We generally sync pack writes
>  already, so we're covered there. But we do not sync ref updates at
>  all, which we'd probably want to in a default-safe setup (a common
>  post-crash symptom I've seen is zero-length ref files).

I've not seen them myself yet, but yes, they need an fsync.

>   2. Is it sufficient to fsync() the individual file's descriptors?
>  We often do other filesystem operations (like hardlinking or
>  renaming) that also need to be committed to disk before an
>  operation can be considered saved.

No, for metadata operations we need to fsync the directory as well.

>   3. Related to (2), we often care about the order of metadata commits.
>  E.g., a common sequence is:
> 
>a. Write object contents to tempfile.
> 
>b. rename() or hardlink tempfile to final name.
> 
>c. Write object name into ref.lock file.
> 
>d. rename() ref.lock to ref
> 
>  If we see (d) but not (b), then the result is a corrupted
>  repository. Is this guaranteed by ext4 journaling with
>  data=ordered?

It is not generally guranteed by Linux file system semantics.  Various
file system will actually start writeback of file data before rename,
but not actually wait on it.


Re: [PATCH v2 03/18] ref-filter: make valid_atom as function parameter

2018-01-17 Thread Jeff King
On Tue, Jan 16, 2018 at 09:55:22AM +0300, Оля Тележная wrote:

> > IOW, the progression I'd expect in a series like this is:
> >
> >   1. Teach ref-filter.c to support everything that cat-file can do.
> >
> >   2. Convert cat-file to use ref-filter.c.
> 
> I agree, I even made this and it's working fine:
> https://github.com/git/git/pull/450/commits/1b74f1047f07434dccb207534d1ad45a143e3f2b
> But I decided not to add that to patch because I expand the
> functionality of several commands (not only cat-file and
> for-each-ref), and I need to support all new functionality in a proper
> way, make these error messages, test everything and - the hardest one
> - support many new commands for cat-file. As I understand, it is not
> possible unless we finally move to ref-filter and print results also
> there. Oh, and I also need to rewrite docs in that case. And I decided
> to apply this in another patch. But, please, say your opinion, maybe
> we could do that here in some way.

Yeah, I agree that it will cause changes to other users of ref-filter,
and you'd need to deal with documentation and tests there. But I think
we're going to have to do those things eventually (since supporting
those extra fields everywhere is part of the point of the project). And
by doing them now, I think it can make the transition for cat-file a lot
simpler, because we never have to puzzle over this intermediate state
where only some of the atoms are valid for cat-file.

I also agree that moving cat-file's printing over to ref-filter.c is
going to introduce a lot of corner cases (like the behavior when
somebody does "cat-file --batch-check=%(refname)" with a bare sha1). But
I think a progression of patches handling those cases would be pretty
easy to follow. E.g., I'd expect to see a patch that teaches
ref-filter[1] how to handle callers that don't have a ref, but just a
bare object name. And it would be easier to evaluate that on its own,
because we know that's an end state we're going to have to handle, and
not some funny state created as part of the transition.

-Peff

[1] And here's where we might start to think about calling it something
besides ref-filter, if we don't necessarily have a ref. :)


Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Ævar Arnfjörð Bjarmason

On Wed, Jan 17 2018, Junio C. Hamano jotted:

> Christoph Hellwig  writes:
>
>> fsync is required for data integrity as there is no gurantee that
>> data makes it to disk at any specified time without it.  Even for
>> ext3 with data=ordered mode the file system will only commit all
>> data at some point in time that is not guaranteed.
>
> It comes from this one:
>
> commit aafe9fbaf4f1d1f27a6f6e3eb3e246fff81240ef
> Author: Linus Torvalds 
> Date:   Wed Jun 18 15:18:44 2008 -0700
>
> Add config option to enable 'fsync()' of object files
>
> As explained in the documentation[*] this is totally useless on
> filesystems that do ordered/journalled data writes, but it can be a
> useful safety feature on filesystems like HFS+ that only journal the
> metadata, not the actual file contents.
>
> It defaults to off, although we could presumably in theory some day
> auto-enable it on a per-filesystem basis.
>
> [*] Yes, I updated the docs for the thing.  Hell really _has_ frozen
> over, and the four horsemen are probably just beyond the horizon.
> EVERYBODY PANIC!
>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 0e25b2c92..9a1cec5c8 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -866,10 +866,8 @@ core.whitespace::
>>  core.fsyncObjectFiles::
>>  This boolean will enable 'fsync()' when writing object files.
>>  +
>> -This is a total waste of time and effort on a filesystem that orders
>> -data writes properly, but can be useful for filesystems that do not use
>> -journalling (traditional UNIX filesystems) or that only journal metadata
>> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
>> +This option is enabled by default and ensures actual data integrity
>> +by calling fsync after writing object files.
>
> I am somewhat sympathetic to the desire to flip the default to
> "safe" and allow those who know they are already safe to tweak the
> knob for performance, and it also makes sense to document that the
> default is "true" here.  But I do not see the point of removing the
> four lines from this paragraph; the sole effect of the removal is to
> rob information from readers that they can use to decide if they
> want to disable the configuration, no?

[CC'd the author of the current behavior]

Some points/questions:

 a) Is there some reliable way to test whether this is needed from
userspace? I'm thinking something like `git update-index
--test-untracked-cache` but for fsync().

 b) On the filesystems that don't need this, what's the performance
impact?

I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
on the tests I thought might do a lot of loose object writes:

  $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ 
fsync-on p3400-rebase.sh p0007-write-cache.sh
  [...]
  Testfsync-on~ 
fsync-on
  
---
  3400.2: rebase on top of a lot of unrelated changes 
1.45(1.30+0.17)   1.45(1.28+0.20) +0.0%
  3400.4: rebase a lot of unrelated changes without split-index   
4.34(3.71+0.66)   4.33(3.69+0.66) -0.2%
  3400.6: rebase a lot of unrelated changes with split-index  
3.38(2.94+0.47)   3.38(2.93+0.47) +0.0%
  0007.2: write_locked_index 3 times (3214 files) 
0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%

   No impact. However I did my own test of running the test suite 10%
   times with/without this patch, and it runs 9% slower:

 fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 
21.95
  fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 
23.88

   Test script at the end of this E-Mail.

 c) What sort of guarantees in this regard do NFS-mounted filesystems
commonly make?

Test script:

use v5.10.0;
use strict;
use warnings;
use Time::HiRes qw(time);
use List::Util qw(sum);
use Data::Dumper;

my %time;
for my $ref (@ARGV) {
system "git checkout $ref";
system qq[make -j56 CFLAGS="-O3 -g" NO_OPENSSL=Y all];
for (1..10) {
my $t0 = -time();
system "(cd t && NO_SVN_TESTS=1 GIT_TEST_HTTPD=0 prove -j56 
--state=slow,save t[0-9]*.sh)";
$t0 += time();
push @{$time{$ref}} => $t0;
}
}
for my $ref (sort keys %time) {
printf "%20s: avg:%.2f %s\n",
$ref,
sum(@{$time{$ref}})/@{$time{$ref}},
join(" ", map { sprintf "%.02f", $_ } sort { $a <=> $b } 
@{$time{$ref}});
}


Re: [PATCH v2 2/2] sha1_file: improve sha1_file_name() perfs

2018-01-17 Thread Christian Couder
On Wed, Jan 17, 2018 at 9:37 PM, Jeff Hostetler  wrote:
>
>
> On 1/17/2018 12:54 PM, Christian Couder wrote:
>>
>> As sha1_file_name() could be performance sensitive, let's
>> try to make it faster by seeding the initial buffer size
>> to avoid any need to realloc and by using strbuf_addstr()
>> and strbuf_addc() instead of strbuf_addf().
>>
>> Helped-by: Derrick Stolee 
>> Helped-by: Jeff Hostetler 
>> Signed-off-by: Christian Couder 
>> ---
>>   sha1_file.c | 8 +++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/sha1_file.c b/sha1_file.c
>> index f66c21b2da..1a94716962 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -323,8 +323,14 @@ static void fill_sha1_path(struct strbuf *buf, const
>> unsigned char *sha1)
>> void sha1_file_name(struct strbuf *buf, const unsigned char *sha1)
>>   {
>> -   strbuf_addf(buf, "%s/", get_object_directory());
>> +   const char *obj_dir = get_object_directory();
>> +   size_t extra = strlen(obj_dir) + 1 + GIT_MAX_HEXSZ;
>
> Very minor nit.  Should this be "+3" rather than "+1"?
> One for the slash after obj_dir, one for the slash between
> "xx/y[38]", and one for the trailing NUL.

I think the trailing NUL is handled by strbuf_grow(), but I forgot
about the / between "xx/y[38]", so I think it should be "+2" .

Anyway I agree with Junio that using strbuf_grow() is not really needed.

>>   + if (extra > strbuf_avail(buf))
>> +   strbuf_grow(buf, extra);
>> +
>> +   strbuf_addstr(buf, obj_dir);
>> +   strbuf_addch(buf, '/');
>> fill_sha1_path(buf, sha1);
>>   }


Re: [PATCH v2 04/18] cat-file: move struct expand_data into ref-filter

2018-01-17 Thread Jeff King
On Tue, Jan 16, 2018 at 10:00:42AM +0300, Оля Тележная wrote:

> > I think some of these will want to remain in cat-file.c. For instance,
> > split_on_whitespace is not something that ref-filter.c itself would care
> > about. I'd expect in the endgame for cat-file to keep its own
> > split_on_whitespace flag, and to set it based on the presence of the
> > %(rest) flag, which it can check by seeing if the "rest" atom is in the
> > "used_atom" list after parsing the format.
> 
> Yes, maybe we will need to leave some part of expand_data variables.
> But, if it would be only "split_on_whitespace", it's better to make
> just one flag without any other stuff. Actually, I thought about
> moving related logic to ref-filter also. Anyway, it's hard to say
> exact things before we code everything. Do I need to fix commit
> message and make it more soft?

Yeah, I think it might just be split_on_whitespace, and that could live
on as a single variable in cat-file.c.

I don't think it makes sense to move that part to ref-filter, since it's
not about formatting at all. It's about how cat-file parses its input,
which is going to be unlike other ref-filter users (which don't
generally parse input at all).

-Peff


Re: [PATCH v2 01/18] cat-file: split expand_atom into 2 functions

2018-01-17 Thread Jeff King
On Tue, Jan 16, 2018 at 10:22:23AM +0300, Оля Тележная wrote:

> >> In other words, I think the endgame is that expand_atom() isn't there at
> >> all, and we're calling the equivalent of format_ref_item() for each
> >> object (except that in a unified formatting world, it probably doesn't
> >> have the word "ref" in it, since that's just one of the items a caller
> >> might pass in).
> 
> Agree! I want to merge current edits, then create format.h file and
> make some renames, then finish migrating process to new format.h and
> support all new meaningful tags.

I think we have a little bit of chicken and egg there, though. I'm
having trouble reviewing the current work, because it's hard to evaluate
whether it's doing the right thing without seeing the end state. So what
I was suggesting in my earlier mails was that we actually _not_ try to
merge this series, but use its components and ideas to build a new
series that does things in a bit different order.

Some of the patches here would end up thrown out, and some would get
cherry-picked into the new series. And some would have some conflicts
and need to get parts rewritten to fit into the new order.

-Peff


Re: [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files"

2018-01-17 Thread Junio C Hamano
Michael Haggerty  writes:

> So I will follow up this email with three patches:
>
> 1. Mention that `snapshot::buf` can be NULL for empty files
>
>I suggest squashing this into your patch, to make it clear that
>`snapshot::buf` and `snapshot::eof` can also be NULL if the
>`packed-refs` file is empty.
>
> 2. create_snapshot(): exit early if the file was empty
>
>Avoid undefined behavior by returning early if `snapshot->buf` is
>NULL.
>
> 3. find_reference_location(): don't invoke if `snapshot->buf` is NULL
>
>Avoid undefined behavior and confusing semantics by not calling
>`find_reference_location()` when `snapshot->buf` is NULL.

These look all sensible with today's code and with v2 from this
thread.

With the v3, i.e. "do the xmalloc() even for size==0", however,
snapshot->buf would never be NULL, so I'd shelve them for now,
though.

Thanks.


Re: [PATCH v2 2/2] send-email: Support separate Reply-To address

2018-01-17 Thread Ævar Arnfjörð Bjarmason

On Wed, Jan 17 2018, Christian Ludwig jotted:

> +if (defined $reply_to) {
> + $reply_to =~ s/^\s+|\s+$//g;
> + ($reply_to) = expand_aliases($reply_to);
> + $reply_to = sanitize_address($reply_to);
> +}

Just a note to other reviewers: I found it odd to call a function with
one argument and then enforce list context, but I see expand_aliases()
expects to be called that way, and this is copied from a previous
pattern earlier in the file.

(As an aside, that code looks a bit odd, but then I see 302e04ea4d
("send-email: detect cycles in alias expansion", 2009-07-23) )

Looks good to me.


Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason
 wrote:
>
> I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
> on the tests I thought might do a lot of loose object writes:
>
>   $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux 
> GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master 
> fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh
>   [...]
>   Test
> fsync-on~ fsync-on
>   
> ---
>   3400.2: rebase on top of a lot of unrelated changes 
> 1.45(1.30+0.17)   1.45(1.28+0.20) +0.0%
>   3400.4: rebase a lot of unrelated changes without split-index   
> 4.34(3.71+0.66)   4.33(3.69+0.66) -0.2%
>   3400.6: rebase a lot of unrelated changes with split-index  
> 3.38(2.94+0.47)   3.38(2.93+0.47) +0.0%
>   0007.2: write_locked_index 3 times (3214 files) 
> 0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
>
>No impact. However I did my own test of running the test suite 10%
>times with/without this patch, and it runs 9% slower:
>
>  fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 
> 21.63 21.95
>   fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 
> 23.83 23.88

That's not the thing you should check.

Now re-do the test while another process writes to a totally unrelated
a huge file (say, do a ISO file copy or something).

That was the thing that several filesystems get completely and
horribly wrong. Generally _particularly_ the logging filesystems that
don't even need the fsync, because they use a single log for
everything (so fsync serializes all the writes, not just the writes to
the one file it's fsync'ing).

The original git design was very much to write each object file
without any syncing, because they don't matter since a new object file
- by definition - isn't really reachable. Then sync before writing the
index file or a new ref.

But things have changed, I'm not arguing that the code shouldn't be
made safe by default. I personally refuse to use rotating media on my
machines anyway, largely exactly because of the fsync() issue (things
like "firefox" started doing fsync on the mysql database for stupid
things, and you'd get huge pauses).

But I do think your benchmark is wrong. The case where only git does
something is not interesting or relevant. It really is "git does
something _and_ somebody else writes something entirely unrelated at
the same time" that matters.

  Linus


Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files

2018-01-17 Thread Jeff King
On Tue, Jan 16, 2018 at 08:38:15PM +0100, Kim Gybels wrote:

> Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
> small files, 2010-02-21) and use read() instead of mmap() for small
> packed-refs files.
> 
> This also fixes the problem[1] where xmmap() returns NULL for zero
> length[2], for which munmap() later fails.
> 
> Alternatively, we could simply check for NULL before munmap(), or
> introduce xmunmap() that could be used together with xmmap(). However,
> always setting snapshot->buf to a valid pointer, by relying on
> xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots
> easier.
> 
> [1] https://github.com/git-for-windows/git/issues/1410
> [2] Logic introduced in commit 9130ac1e196 (Better error messages for
> corrupt databases, 2007-01-11)
> 
> Signed-off-by: Kim Gybels 
> ---
> 
> Change since v2: removed separate case for zero length as suggested by Peff,
> ensuring that snapshot->buf is always a valid pointer.

Thanks, this looks fine to me (I'd be curious to hear from Michael if
this eliminates the need for the other patches).

-Peff


Re: [PATCH v5 4/7] trace.c: introduce trace_run_command()

2018-01-17 Thread Jeff King
On Mon, Jan 15, 2018 at 05:59:46PM +0700, Nguyễn Thái Ngọc Duy wrote:

> diff --git a/trace.c b/trace.c
> index 7f3b08e148..da3db301e7 100644
> --- a/trace.c
> +++ b/trace.c
> @@ -23,6 +23,7 @@
>  
>  #include "cache.h"
>  #include "quote.h"
> +#include "run-command.h"
>  
>  struct trace_key trace_default_key = { "GIT_TRACE", 0, 0, 0 };
>  struct trace_key trace_perf_key = TRACE_KEY_INIT(PERFORMANCE);
> @@ -275,6 +276,21 @@ void trace_performance_fl(const char *file, int line, 
> uint64_t nanos,
>  #endif /* HAVE_VARIADIC_MACROS */
>  
>  
> +void trace_run_command(const struct child_process *cp)
> +{
> + struct strbuf buf = STRBUF_INIT;
> +
> + if (!prepare_trace_line(__FILE__, __LINE__,
> + &trace_default_key, &buf))
> + return;

I think technically this should be TRACE_CONTEXT instead of __FILE__,
though I wonder if anybody has ever set that (there is not even a
Makefile knob for it, so you'd have to -DTRACE_CONTEXT manually).

I actually wonder if it would be nicer to keep this whole thing in
run-command.c, and just prepare it all in a buffer. That's basically
what we're doing here anyway, and then we could get rid of the funny
__FILE__ stuff. I.e., something like:

  struct strbuf buf = STRBUF_INIT;

  if (!trace_want(&trace_default_key))
return;

  strbuf_addf(&buf, "trace: run_command: ");
  sq_quote_argv_pretty(&buf, cp->argv);

  trace_printf("%s", buf.buf);
  strbuf_release(&buf);

AFAICT we aren't really using any internals of trace.c in our new
function.  That would get rid of this __FILE__ bit, it would eliminate
the need for trace.h to know about "struct child_process", and it would
mean the output still says "run-command.c" in it.

-Peff


Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 2:07 PM, Linus Torvalds
 wrote:
>
> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.

.. actually, I think it originally sync'ed only before starting to
actually remove objects due to repacking.

The theory was that you can lose the last commit series or whatever,
and have to git-fsck, and have to re-do it, but you could never lose
long-term work. If your machine crashes, you still remember what you
did just before the crash.

That theory may have been more correct back in the days than it is
now. People who use git might be less willing to treat it like a
filesystem that you can fsck than I was back 10+ ago.

It's worth noting that the commit that Junio pointed to (from 2008)
didn't actually change any behavior. It just allowed people who cared
to change behavior. The original "let's not waste time on fsync every
object write, because we can just re-create the state anyway" behavior
goes back to 2005.

  Linus


Re: [PATCH v2 2/2] sha1_file: improve sha1_file_name() perfs

2018-01-17 Thread Jeff King
On Wed, Jan 17, 2018 at 12:54:33PM -0800, Junio C Hamano wrote:

> Jeff Hostetler  writes:
> 
> >> void sha1_file_name(struct strbuf *buf, const unsigned char
> >> *sha1)
> >>   {
> >> -  strbuf_addf(buf, "%s/", get_object_directory());
> >> +  const char *obj_dir = get_object_directory();
> >> +  size_t extra = strlen(obj_dir) + 1 + GIT_MAX_HEXSZ;
> >
> > Very minor nit.  Should this be "+3" rather than "+1"?
> > One for the slash after obj_dir, one for the slash between
> > "xx/y[38]", and one for the trailing NUL.
> >
> >>   +if (extra > strbuf_avail(buf))
> >> +  strbuf_grow(buf, extra);
> 
> The callers who care use static strbuf with 1/2, which lets them
> grow it to an appropriate size after they make their first call.
> 
> On the other hand, the ones to which performance does not matter by
> definition do not care.
> 
> I actually think this whole "extra -> grow" business should be
> discarded.  With a miscomputed "extra" like this, it does not help
> anybody---everybody may pay cost for one extra realloc due to the
> miscalculation, and the ones that do care also do during their first
> call.

Let me second that. The diffstat here, along with the magic numbers, is
not really encouraging unless we have a demonstrable speedup. In which
case we can then measure and compare other approaches, like pushing a
static strbuf farther up the stack. But without that, it feels like
stumbling around in the dark.

-Peff


Re: [PATCH] packfile: use get_be64() for large offsets

2018-01-17 Thread Jeff King
On Wed, Jan 17, 2018 at 02:08:23PM -0500, Derrick Stolee wrote:

> The pack-index version 2 format uses two 4-byte integers in network-byte 
> order to represent one 8-byte value. The current implementation has several 
> code clones for stitching these integers together.

Yeah, this seems like a no-brainer, and the patch itself looks obviously
correct.

Please wrap your commit messages a bit shorter, though (say, between
60-72 chars).

-Peff


Re: [PATCH v2 03/18] ref-filter: make valid_atom as function parameter

2018-01-17 Thread Christian Couder
On Wed, Jan 17, 2018 at 10:43 PM, Jeff King  wrote:
> On Tue, Jan 16, 2018 at 09:55:22AM +0300, Оля Тележная wrote:
>
>> > IOW, the progression I'd expect in a series like this is:
>> >
>> >   1. Teach ref-filter.c to support everything that cat-file can do.
>> >
>> >   2. Convert cat-file to use ref-filter.c.
>>
>> I agree, I even made this and it's working fine:
>> https://github.com/git/git/pull/450/commits/1b74f1047f07434dccb207534d1ad45a143e3f2b

(Nit: it looks like the above link does not work any more, but it
seems that you are talking about the last patch on the catfile
branch.)

>> But I decided not to add that to patch because I expand the
>> functionality of several commands (not only cat-file and
>> for-each-ref), and I need to support all new functionality in a proper
>> way, make these error messages, test everything and - the hardest one
>> - support many new commands for cat-file. As I understand, it is not
>> possible unless we finally move to ref-filter and print results also
>> there. Oh, and I also need to rewrite docs in that case. And I decided
>> to apply this in another patch. But, please, say your opinion, maybe
>> we could do that here in some way.
>
> Yeah, I agree that it will cause changes to other users of ref-filter,
> and you'd need to deal with documentation and tests there. But I think
> we're going to have to do those things eventually (since supporting
> those extra fields everywhere is part of the point of the project). And
> by doing them now, I think it can make the transition for cat-file a lot
> simpler, because we never have to puzzle over this intermediate state
> where only some of the atoms are valid for cat-file.

I agree that you will have to deal with documentation and tests at one
point and that it could be a good idea to do that now.

I wonder if it is possible to add atoms one by one into ref-filter and
to add tests and documentation at the same time, instead of merging
cat-file atoms with ref-filter atoms in one big step.

When all the cat-file atoms have been added to ref-filter's
valid_atom, maybe you could add ref-filter's atoms to cat-file's
valid_cat_file_atom one by one and add tests and documentation at the
same time.

And then when ref-filter's valid_atom and cat-file's
valid_cat_file_atom are identical you can remove cat-file's
valid_cat_file_atom and maybe after that rename "ref-filter" to
"format".


Re: [PATCH v5 4/7] trace.c: introduce trace_run_command()

2018-01-17 Thread Junio C Hamano
Jeff King  writes:

>> +void trace_run_command(const struct child_process *cp)
>> +{
>> +struct strbuf buf = STRBUF_INIT;
>> +
>> +if (!prepare_trace_line(__FILE__, __LINE__,
>> +&trace_default_key, &buf))
>> +return;
>
> I think technically this should be TRACE_CONTEXT instead of __FILE__,
> though I wonder if anybody has ever set that (there is not even a
> Makefile knob for it, so you'd have to -DTRACE_CONTEXT manually).

Oooh. Today I learned a new thing ;-)

> I actually wonder if it would be nicer to keep this whole thing in
> run-command.c, and just prepare it all in a buffer. That's basically
> what we're doing here anyway, and then we could get rid of the funny
> __FILE__ stuff. I.e., something like:
>
>   struct strbuf buf = STRBUF_INIT;
>
>   if (!trace_want(&trace_default_key))
>   return;
>
>   strbuf_addf(&buf, "trace: run_command: ");
>   sq_quote_argv_pretty(&buf, cp->argv);
>
>   trace_printf("%s", buf.buf);
>   strbuf_release(&buf);
>
> AFAICT we aren't really using any internals of trace.c in our new
> function.  That would get rid of this __FILE__ bit, it would eliminate
> the need for trace.h to know about "struct child_process", and it would
> mean the output still says "run-command.c" in it.

Sounds nice.


Re: [PATCH v2 01/18] cat-file: split expand_atom into 2 functions

2018-01-17 Thread Christian Couder
On Wed, Jan 17, 2018 at 10:49 PM, Jeff King  wrote:
> On Tue, Jan 16, 2018 at 10:22:23AM +0300, Оля Тележная wrote:
>
>> >> In other words, I think the endgame is that expand_atom() isn't there at
>> >> all, and we're calling the equivalent of format_ref_item() for each
>> >> object (except that in a unified formatting world, it probably doesn't
>> >> have the word "ref" in it, since that's just one of the items a caller
>> >> might pass in).
>>
>> Agree! I want to merge current edits, then create format.h file and
>> make some renames, then finish migrating process to new format.h and
>> support all new meaningful tags.
>
> I think we have a little bit of chicken and egg there, though. I'm
> having trouble reviewing the current work, because it's hard to evaluate
> whether it's doing the right thing without seeing the end state.

Yeah, to me it feels like you are at a middle point and there are many
ways to go forward.

As I wrote in another email though, I think it might be a good time to
consolidate new functionality by adding tests (and perhaps
documentation at the same time) for each new atom that is added to
ref-filter or cat-file. It will help you refactor the code and your
patch series later without breaking the new functionality.

> So what
> I was suggesting in my earlier mails was that we actually _not_ try to
> merge this series, but use its components and ideas to build a new
> series that does things in a bit different order.

Yeah, I think you will have to do that, but the tests that you can add
now for the new features will help you when you will build the new
series.

And hopefully it will not be too much work to create this new series
as you will perhaps be able to just use the interactive rebase to
build it.

I also don't think it's a big problem if the current patch series gets
quite long before you start creating a new series.


Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Ævar Arnfjörð Bjarmason

On Wed, Jan 17 2018, Linus Torvalds jotted:

> On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason
>  wrote:
>>
>> I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
>> on the tests I thought might do a lot of loose object writes:
>>
>>   $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux 
>> GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master 
>> fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh
>>   [...]
>>   Test
>> fsync-on~ fsync-on
>>   
>> ---
>>   3400.2: rebase on top of a lot of unrelated changes 
>> 1.45(1.30+0.17)   1.45(1.28+0.20) +0.0%
>>   3400.4: rebase a lot of unrelated changes without split-index   
>> 4.34(3.71+0.66)   4.33(3.69+0.66) -0.2%
>>   3400.6: rebase a lot of unrelated changes with split-index  
>> 3.38(2.94+0.47)   3.38(2.93+0.47) +0.0%
>>   0007.2: write_locked_index 3 times (3214 files) 
>> 0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
>>
>>No impact. However I did my own test of running the test suite 10%
>>times with/without this patch, and it runs 9% slower:

That should be "10 times" b.t.w., not "10% times"

>>
>>  fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 
>> 21.63 21.95
>>   fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 
>> 23.83 23.88
>
> That's not the thing you should check.
>
> Now re-do the test while another process writes to a totally unrelated
> a huge file (say, do a ISO file copy or something).
>
> That was the thing that several filesystems get completely and
> horribly wrong. Generally _particularly_ the logging filesystems that
> don't even need the fsync, because they use a single log for
> everything (so fsync serializes all the writes, not just the writes to
> the one file it's fsync'ing).
>
> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.
>>
> But things have changed, I'm not arguing that the code shouldn't be
> made safe by default. I personally refuse to use rotating media on my
> machines anyway, largely exactly because of the fsync() issue (things
> like "firefox" started doing fsync on the mysql database for stupid
> things, and you'd get huge pauses).
>
> But I do think your benchmark is wrong. The case where only git does
> something is not interesting or relevant. It really is "git does
> something _and_ somebody else writes something entirely unrelated at
> the same time" that matters.

Yeah it's shitty, just a quick hack to get some since there was a
discussion of performance, but neither your original patch or this
thread had quoted any.

One thing you may have missed is that this is a parallel (56 tests at a
time) run of the full test suite. So there's plenty of other git
processes (and test setup/teardown) racing with any given git
process. Running the test suite in a loop like this gives me ~100K IO
ops/s & ~50% disk utilization.

Or does overall FS activity and raw throughput (e.g. with an ISO copy)
matter more than general FS contention?

Tweaking it to emulate this iso copy case, running another test with one
of these running concurrently:

# setup
dd if=/dev/urandom of=/tmp/fake.iso bs=1024 count=$((1000*1024))
# run in a loop (shuf to not always write the same thing)
while sleep 0.1; do shuf /tmp/fake.iso | pv >/tmp/fake.shuf.iso; done

Gives throughput that spikes to 100% (not consistently) and:

fsync-off: avg:36.37 31.74 33.83 35.12 36.19 36.32 37.04 37.34 37.71 37.93 
40.43
 fsync-on: avg:38.09 34.56 35.14 35.69 36.41 36.41 37.96 38.25 40.45 41.44 
44.59

~4.7% slower, v.s. ~8.5% in my earlier
87h8rki2iu@evledraar.gmail.com without that running.

Which is not an argument for / against this patch, but those numbers
seem significant, and generally if the entire test suite slows down by
that much there's going to be sub-parts of it that are much worse.

Which might be a reason to tread more carefully and if it *does* slow
things down perhaps do it with more granularity, e.g. turning it on in
git-receive-pack might be more sensible than in git-filter-branch.

I remember Roberto Tyley's BFG writing an amazing amount of loose
objects, but it doesn't seem to have an fsync() option, I wonder if
adding one would be a representative pathological test case.


Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 3:16 PM, Ævar Arnfjörð Bjarmason
 wrote:
>
> Or does overall FS activity and raw throughput (e.g. with an ISO copy)
> matter more than general FS contention?

Traditionally, yes.

Also note that none of this is about "throughput". It's about waiting
for a second or two when you do a simple "git commit" or similar.

Also, hardware has changed. I haven't used a rotational disk in about
10 years now, and I don't worry about latencies so much more.

The fact that you get ~100k iops indicates that you probably aren't
using those stupid platters of rust either. So I doubt you can even
trigger the bad cases.

Linus


Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Theodore Ts'o
On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote:
> 
> Now re-do the test while another process writes to a totally unrelated
> a huge file (say, do a ISO file copy or something).
> 
> That was the thing that several filesystems get completely and
> horribly wrong. Generally _particularly_ the logging filesystems that
> don't even need the fsync, because they use a single log for
> everything (so fsync serializes all the writes, not just the writes to
> the one file it's fsync'ing).

Well, let's be fair; this is something *ext3* got wrong, and it was
the default file system back them.  All of the modern file systems now
do delayed allocation, which means that an fsync of one file doesn't
actually imply an fsync of another file.  Hence...

> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.

This isn't really safe any more.  Yes, there's a single log.  But
files which are subject to delayed allocation are in the page cache,
and just because you fsync the index file doesn't mean that the object
file is now written to disk.  It was true for ext3, but it's not true
for ext4, xfs, btrfs, etc.

The good news is that if you have another process downloading a huge
ISO image, the fsync of the index file won't force the ISO file to be
written out.  The bad news is that it won't force out the other git
object files, either.

Now, there is a potential downside of fsync'ing each object file, and
that is the cost of doing a CACHE FLUSH on a HDD is non-trivial, and
even on a SSD, it's not optimal to call CACHE FLUSH thousands of times
in a second.  So if you are creating thousands of tiny files, and you
fsync each one, each fsync(2) call is a serializing instruction, which
means it won't return until that one file is written to disk.  If you
are writing lots of small files, and you are using a HDD, you'll be
bottlenecked to around 30 files per second on a 5400 RPM HDD, and this
is true regardless of what file system you use, because the bottle
neck is the CACHE FLUSH operation, and how you organize the metadata
and how you do the block allocation, is largely lost in the noise
compared to the CACHE FLUSH command, which serializes everything.

There are solutions to this; you could simply not call fsync(2) a
thousand times, and instead write a pack file, and call fsync once on
the pack file.  That's probably the smartest approach.

You could also create a thousand threads, and call fsync(2) on those
thousand threads at roughly the same time.  Or you could use a
bleeding edge kernel with the latest AIO patch, and use the newly
added IOCB_CMD_FSYNC support.

But I'd simply recommend writing a pack and fsync'ing the pack,
instead of trying to write a gazillion object files.  (git-repack -A,
I'm looking at you)

- Ted


Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-17 Thread Linus Torvalds
On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o  wrote:
>
> Well, let's be fair; this is something *ext3* got wrong, and it was
> the default file system back them.

I'm pretty sure reiserfs and btrfs did too..

  Linus


[RFC PATCH 2/2] http: support omitting data from traces

2018-01-17 Thread Jonathan Tan
GIT_TRACE_CURL provides a way to debug what is being sent and received
over HTTP, with automatic redaction of sensitive information. But it
also logs data transmissions, which significantly increases the log file
size, sometimes unnecessarily. Add an option "GIT_TRACE_CURL_NO_DATA" to
allow the user to omit such data transmissions.

Signed-off-by: Jonathan Tan 
---
 http.c  | 27 +++
 t/t5551-http-fetch-smart.sh | 12 
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/http.c b/http.c
index 088cf70bf..32a823895 100644
--- a/http.c
+++ b/http.c
@@ -16,6 +16,7 @@
 #include "string-list.h"
 
 static struct trace_key trace_curl = TRACE_KEY_INIT(CURL);
+static int trace_curl_data = 1;
 static struct string_list cookies_to_redact = STRING_LIST_INIT_DUP;
 #if LIBCURL_VERSION_NUM >= 0x070a08
 long int git_curl_ipresolve = CURL_IPRESOLVE_WHATEVER;
@@ -695,24 +696,32 @@ static int curl_trace(CURL *handle, curl_infotype type, 
char *data, size_t size,
curl_dump_header(text, (unsigned char *)data, size, DO_FILTER);
break;
case CURLINFO_DATA_OUT:
-   text = "=> Send data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "=> Send data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
case CURLINFO_SSL_DATA_OUT:
-   text = "=> Send SSL data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "=> Send SSL data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
case CURLINFO_HEADER_IN:
text = "<= Recv header";
curl_dump_header(text, (unsigned char *)data, size, NO_FILTER);
break;
case CURLINFO_DATA_IN:
-   text = "<= Recv data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "<= Recv data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
case CURLINFO_SSL_DATA_IN:
-   text = "<= Recv SSL data";
-   curl_dump_data(text, (unsigned char *)data, size);
+   if (trace_curl_data) {
+   text = "<= Recv SSL data";
+   curl_dump_data(text, (unsigned char *)data, size);
+   }
break;
 
default:/* we ignore unknown types by default */
@@ -857,6 +866,8 @@ static CURL *get_curl_handle(void)
if (getenv("GIT_CURL_VERBOSE"))
curl_easy_setopt(result, CURLOPT_VERBOSE, 1L);
setup_curl_trace(result);
+   if (getenv("GIT_TRACE_CURL_NO_DATA"))
+   trace_curl_data = 0;
if (getenv("GIT_REDACT_COOKIES")) {
string_list_split(&cookies_to_redact,
  getenv("GIT_REDACT_COOKIES"), ',', -1);
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 88bd9c094..0c62d00af 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -376,5 +376,17 @@ test_expect_success 'GIT_REDACT_COOKIES redacts cookies' '
! grep "Cookie:.*Secret=Bar" err
 '
 
+test_expect_success 'GIT_TRACE_CURL_NO_DATA prevents data from being traced' '
+   rm -rf clone &&
+   GIT_TRACE_CURL=true \
+   git clone $HTTPD_URL/smart/repo.git clone 2>err &&
+   grep "=> Send data" err &&
+
+   rm -rf clone &&
+   GIT_TRACE_CURL=true GIT_TRACE_CURL_NO_DATA=1 \
+   git clone $HTTPD_URL/smart/repo.git clone 2>err &&
+   ! grep "=> Send data" err
+'
+
 stop_httpd
 test_done
-- 
2.16.0.rc1.238.g530d649a79-goog



[RFC PATCH 0/2] Cookie redaction during GIT_TRACE_CURL

2018-01-17 Thread Jonathan Tan
Sometimes authentication information is sent over HTTP through cookies,
but when using GIT_TRACE_CURL, that information appears in logs. There
are some HTTP headers already redacted ("Authorization:" and
"Proxy-Authorization:") - the first patch extends such redaction to a
user-specified list.

I've also included another patch to allow omission of data transmission
information from being logged when using GIT_TRACE_CURL. This reduces
the information logged to that similar to GIT_CURL_VERBOSE.
(As for why not use GIT_CURL_VERBOSE instead - that is because
GIT_CURL_VERBOSE does not perform any redaction, merely using Curl's
default logging mechanism.)

The patches are ready for merging, but I marked this as "RFC" just in
case there is a better way to accomplish this.

Jonathan Tan (2):
  http: support cookie redaction when tracing
  http: support omitting data from traces

 http.c  | 82 -
 t/t5551-http-fetch-smart.sh | 24 +
 2 files changed, 98 insertions(+), 8 deletions(-)

-- 
2.16.0.rc1.238.g530d649a79-goog



[RFC PATCH 1/2] http: support cookie redaction when tracing

2018-01-17 Thread Jonathan Tan
When using GIT_TRACE_CURL, Git already redacts the "Authorization:" and
"Proxy-Authorization:" HTTP headers. Extend this redaction to a
user-specified list of cookies, specified through the
"GIT_REDACT_COOKIES" environment variable.

Signed-off-by: Jonathan Tan 
---
 http.c  | 55 +
 t/t5551-http-fetch-smart.sh | 12 ++
 2 files changed, 67 insertions(+)

diff --git a/http.c b/http.c
index 597771271..088cf70bf 100644
--- a/http.c
+++ b/http.c
@@ -13,8 +13,10 @@
 #include "transport.h"
 #include "packfile.h"
 #include "protocol.h"
+#include "string-list.h"
 
 static struct trace_key trace_curl = TRACE_KEY_INIT(CURL);
+static struct string_list cookies_to_redact = STRING_LIST_INIT_DUP;
 #if LIBCURL_VERSION_NUM >= 0x070a08
 long int git_curl_ipresolve = CURL_IPRESOLVE_WHATEVER;
 #else
@@ -575,6 +577,54 @@ static void redact_sensitive_header(struct strbuf *header)
/* Everything else is opaque and possibly sensitive */
strbuf_setlen(header,  sensitive_header - header->buf);
strbuf_addstr(header, " ");
+   } else if (cookies_to_redact.nr &&
+  skip_prefix(header->buf, "Cookie:", &sensitive_header)) {
+   struct strbuf redacted_header = STRBUF_INIT;
+   char *cookie;
+
+   while (isspace(*sensitive_header))
+   sensitive_header++;
+
+   /*
+* The contents of header starting from sensitive_header will
+* subsequently be overridden, so it is fine to mutate this
+* string (hence the assignment to "char *").
+*/
+   cookie = (char *) sensitive_header;
+
+   while (cookie) {
+   char *equals;
+   char *semicolon = strstr(cookie, "; ");
+   if (semicolon)
+   *semicolon = 0;
+   equals = strchrnul(cookie, '=');
+   if (!equals) {
+   /* invalid cookie, just append and continue */
+   strbuf_addstr(&redacted_header, cookie);
+   continue;
+   }
+   *equals = 0; /* temporarily set to NUL for lookup */
+   if (string_list_lookup(&cookies_to_redact, cookie)) {
+   strbuf_addstr(&redacted_header, cookie);
+   strbuf_addstr(&redacted_header, "=");
+   } else {
+   *equals = '=';
+   strbuf_addstr(&redacted_header, cookie);
+   }
+   if (semicolon) {
+   /*
+* There are more cookies. (Or, for some
+* reason, the input string ends in "; ".)
+*/
+   strbuf_addstr(&redacted_header, "; ");
+   cookie = semicolon + strlen("; ");
+   } else {
+   cookie = NULL;
+   }
+   }
+
+   strbuf_setlen(header, sensitive_header - header->buf);
+   strbuf_addbuf(header, &redacted_header);
}
 }
 
@@ -807,6 +857,11 @@ static CURL *get_curl_handle(void)
if (getenv("GIT_CURL_VERBOSE"))
curl_easy_setopt(result, CURLOPT_VERBOSE, 1L);
setup_curl_trace(result);
+   if (getenv("GIT_REDACT_COOKIES")) {
+   string_list_split(&cookies_to_redact,
+ getenv("GIT_REDACT_COOKIES"), ',', -1);
+   string_list_sort(&cookies_to_redact);
+   }
 
curl_easy_setopt(result, CURLOPT_USERAGENT,
user_agent ? user_agent : git_user_agent());
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index a51b7e20d..88bd9c094 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -364,5 +364,17 @@ test_expect_success 'custom http headers' '
submodule update sub
 '
 
+test_expect_success 'GIT_REDACT_COOKIES redacts cookies' '
+   rm -rf clone &&
+   echo "Set-Cookie: NotSecret=Foo" >cookies &&
+   echo "Set-Cookie: Secret=Bar" >>cookies &&
+   GIT_TRACE_CURL=true GIT_REDACT_COOKIES=Secret,AnotherSecret \
+   git -c "http.cookieFile=$(pwd)/cookies" clone \
+   $HTTPD_URL/smart/repo.git clone 2>err &&
+   grep "Cookie:.*NotSecret=Foo" err &&
+   grep "Cookie:.*Secret=" err &&
+   ! grep "Cookie:.*Secret=Bar" err
+'
+
 stop_httpd
 test_done
-- 
2.16.0.rc1.238.g530d649a79-goog



INQUIRY

2018-01-17 Thread Lfiex Mauritius Ltd



--
Hello,
I saw you email on google and would want to inquire about your product.
I want to confirm you are still using this email hence I send this email 
first.

Kindly reply to me

Best Regards
Silvia Reitmair
Owners Representative |  Mob: +254 73739127
Lfiex Mauritius Ltd |


[ANNOUNCE] Git v2.16.0

2018-01-17 Thread Junio C Hamano
The latest feature release Git v2.16.0 is now available at the
usual places.  It is comprised of 509 non-merge commits since
v2.15.0, contributed by 91 people, 26 of which are new faces.

The tarballs are found at:

https://www.kernel.org/pub/software/scm/git/

The following public repositories all have a copy of the 'v2.16.0'
tag and the 'master' branch that the tag points at:

  url = https://kernel.googlesource.com/pub/scm/git/git
  url = git://repo.or.cz/alt-git.git
  url = https://github.com/gitster/git

New contributors whose contributions weren't in v2.15.0 are as follows.
Welcome to the Git development community!

  Albert Astals Cid, Antoine Beaupré, Damien Marié, Daniel
  Bensoussan, Fangyi Zhou, Florian Klink, Gennady Kupava, Guillaume
  Castagnino, Haaris Mehmood, Hans Jerry Illikainen, Ingo Ruhnke,
  Jakub Bereżański, Jean Carlo Machado, Julien Dusser, J Wyman,
  Kevin, Louis Bettens, Łukasz Stelmach, Marius Paliga, Olga
  Telezhnaya, Rafael Ascensão, Robert Abel, Robert P. J. Day,
  Shuyu Wei, Wei Shuyu, and Zhou Fangyi.

Returning contributors who helped this release are as follows.
Thanks for your continued support.

  Adam Dinwoodie, Ævar Arnfjörð Bjarmason, Alexander Shopov,
  Alex Vandiver, Anders Kaseorg, Andrey Okoshkin, Ann T Ropea,
  Beat Bolli, Ben Peart, Brandon Williams, brian m. carlson, Carlos
  Martín Nieto, Changwoo Ryu, Charles Bailey, Christian Couder,
  Christopher Díaz Riveros, Dave Borowitz, Dennis Kaarsemaker,
  Derrick Stolee, Dimitriy Ryazantcev, Elijah Newren, Emily Xie,
  Eric Sunshine, Eric Wong, Heiko Voigt, Jacob Keller, Jameson
  Miller, Jean-Noel Avila, Jeff Hostetler, Jeff King, Jiang Xin,
  Johannes Schindelin, Jonathan Nieder, Jonathan Tan, Jordi Mas,
  Junio C Hamano, Kaartic Sivaraam, Kevin Daudt, Lars Schneider,
  Liam Beguin, Luke Diamand, Martin Ågren, Michael Haggerty,
  Nicolas Morey-Chaisemartin, Peter Krefting, Phil Hord, Phillip
  Wood, Pranit Bauva, Prathamesh Chavan, Ralf Thielow, Ramsay
  Jones, Randall S. Becker, Rasmus Villemoes, René Scharfe,
  Simon Ruderich, Stefan Beller, Steffen Prohaska, Stephan Beyer,
  SZEDER Gábor, Thomas Braun, Thomas Gummerer, Todd Zullinger,
  Torsten Bögershausen, Trần Ngọc Quân, and W. Trevor King.



Git 2.16 Release Notes
==

Backward compatibility notes and other notable changes.

 * Use of an empty string as a pathspec element that is used for
   'everything matches' is now an error.


Updates since v2.15
---

UI, Workflows & Features

 * An empty string as a pathspec element that means "everything"
   i.e. 'git add ""', is now illegal.  We started this by first
   deprecating and warning a pathspec that has such an element in
   2.11 (Nov 2016).

 * A hook script that is set unexecutable is simply ignored.  Git
   notifies when such a file is ignored, unless the message is
   squelched via advice.ignoredHook configuration.

 * "git pull" has been taught to accept "--[no-]signoff" option and
   pass it down to "git merge".

 * The "--push-option=" option to "git push" now defaults to a
   list of strings configured via push.pushOption variable.

 * "gitweb" checks if a directory is searchable with Perl's "-x"
   operator, which can be enhanced by using "filetest 'access'"
   pragma, which now we do.

 * "git stash save" has been deprecated in favour of "git stash push".

 * The set of paths output from "git status --ignored" was tied
   closely with its "--untracked=" option, but now it can be
   controlled more flexibly.  Most notably, a directory that is
   ignored because it is listed to be ignored in the ignore/exclude
   mechanism can be handled differently from a directory that ends up
   to be ignored only because all files in it are ignored.

 * The remote-helper for talking to MediaWiki has been updated to
   truncate an overlong pagename so that ".mw" suffix can still be
   added.

 * The remote-helper for talking to MediaWiki has been updated to
   work with mediawiki namespaces.

 * The "--format=..." option "git for-each-ref" takes learned to show
   the name of the 'remote' repository and the ref at the remote side
   that is affected for 'upstream' and 'push' via "%(push:remotename)"
   and friends.

 * Doc and message updates to teach users "bisect view" is a synonym
   for "bisect visualize".

 * "git bisect run" that did not specify any command to run used to go
   ahead and treated all commits to be tested as 'good'.  This has
   been corrected by making the command error out.

 * The SubmittingPatches document has been converted to produce an
   HTML version via AsciiDoc/Asciidoctor.

 * We learned to optionally talk to a file system monitor via new
   fsmonitor extension to speed up "git status" and other operations
   that need to see which paths have been modified.  Currently we only
   support "watchman".  See File System Monitor section of
   git-update-index(1) for more detail.

 * 

Re: [PATCH v2 04/18] cat-file: move struct expand_data into ref-filter

2018-01-17 Thread Оля Тележная
2018-01-18 0:45 GMT+03:00 Jeff King :
> On Tue, Jan 16, 2018 at 10:00:42AM +0300, Оля Тележная wrote:
>
>> > I think some of these will want to remain in cat-file.c. For instance,
>> > split_on_whitespace is not something that ref-filter.c itself would care
>> > about. I'd expect in the endgame for cat-file to keep its own
>> > split_on_whitespace flag, and to set it based on the presence of the
>> > %(rest) flag, which it can check by seeing if the "rest" atom is in the
>> > "used_atom" list after parsing the format.
>>
>> Yes, maybe we will need to leave some part of expand_data variables.
>> But, if it would be only "split_on_whitespace", it's better to make
>> just one flag without any other stuff. Actually, I thought about
>> moving related logic to ref-filter also. Anyway, it's hard to say
>> exact things before we code everything. Do I need to fix commit
>> message and make it more soft?
>
> Yeah, I think it might just be split_on_whitespace, and that could live
> on as a single variable in cat-file.c.
>
> I don't think it makes sense to move that part to ref-filter, since it's
> not about formatting at all. It's about how cat-file parses its input,
> which is going to be unlike other ref-filter users (which don't
> generally parse input at all).
>
> -Peff

OK, I will leave split_on_whitespace in cat-file for now. :) There are
so many places that are more important now, in my opinion, so I am
planning just to leave this variable and do other stuff, and I will
return to this question later.


Re: [BUG] git remote prune removes local tags, depending on fetch config

2018-01-17 Thread Kevin Daudt
On Tue, Jan 16, 2018 at 12:48:32PM +0100, Andreas Schwab wrote:
> On Jan 15 2018, Michael Giuffrida  wrote:
> 
> > It doesn't seem like a useful feature -- you wouldn't expect `git
> > fetch --prune` to remove your local branches that you were developing
> > on, right?
> 
> Don't mix local and remote refs.  There is a reason why remote tracking
> branches are put in a separate name space.  If you fetch the remote tags
> into a separate name space (eg. refs/remote/tags/*:refs/tags/*) then
> there is no conflict.
> 
> Andreas.

But then they are no longer considered tags, but remote tracking
branches. Tools like git tag and git describe won't consider them, and
git branch -r would show them as remote tracking branches.

> 
> -- 
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."


Re: [PATCH v2 03/18] ref-filter: make valid_atom as function parameter

2018-01-17 Thread Оля Тележная
2018-01-18 1:39 GMT+03:00 Christian Couder :
> On Wed, Jan 17, 2018 at 10:43 PM, Jeff King  wrote:
>> On Tue, Jan 16, 2018 at 09:55:22AM +0300, Оля Тележная wrote:
>>
>>> > IOW, the progression I'd expect in a series like this is:
>>> >
>>> >   1. Teach ref-filter.c to support everything that cat-file can do.
>>> >
>>> >   2. Convert cat-file to use ref-filter.c.
>>>
>>> I agree, I even made this and it's working fine:
>>> https://github.com/git/git/pull/450/commits/1b74f1047f07434dccb207534d1ad45a143e3f2b
>
> (Nit: it looks like the above link does not work any more, but it
> seems that you are talking about the last patch on the catfile
> branch.)
>
>>> But I decided not to add that to patch because I expand the
>>> functionality of several commands (not only cat-file and
>>> for-each-ref), and I need to support all new functionality in a proper
>>> way, make these error messages, test everything and - the hardest one
>>> - support many new commands for cat-file. As I understand, it is not
>>> possible unless we finally move to ref-filter and print results also
>>> there. Oh, and I also need to rewrite docs in that case. And I decided
>>> to apply this in another patch. But, please, say your opinion, maybe
>>> we could do that here in some way.
>>
>> Yeah, I agree that it will cause changes to other users of ref-filter,
>> and you'd need to deal with documentation and tests there. But I think
>> we're going to have to do those things eventually (since supporting
>> those extra fields everywhere is part of the point of the project). And
>> by doing them now, I think it can make the transition for cat-file a lot
>> simpler, because we never have to puzzle over this intermediate state
>> where only some of the atoms are valid for cat-file.
>
> I agree that you will have to deal with documentation and tests at one
> point and that it could be a good idea to do that now.
>
> I wonder if it is possible to add atoms one by one into ref-filter and
> to add tests and documentation at the same time, instead of merging
> cat-file atoms with ref-filter atoms in one big step.
>
> When all the cat-file atoms have been added to ref-filter's
> valid_atom, maybe you could add ref-filter's atoms to cat-file's
> valid_cat_file_atom one by one and add tests and documentation at the
> same time.
>
> And then when ref-filter's valid_atom and cat-file's
> valid_cat_file_atom are identical you can remove cat-file's
> valid_cat_file_atom and maybe after that rename "ref-filter" to
> "format".

I think it's important to finish migrating process at first. I mean,
now we are preparing and collecting everything in ref-filter, but we
make resulting string and print still in cat-file. And I am not sure,
but maybe it will not be possible to start using new atoms in cat-file
while some part of logic still differs.
And another thoughts here - we were thinking about creating format.h
but decided not to move forward with it, and now we are suffering
because of it. Can I create it right now or the history of commits
would be too dirty because of it? Also, do you mean just renaming of
ref-filter? I was thinking that I need to put formatting-related logic
to another file and leave all other stuff in ref-filter.

Anyway, your suggested steps looks good, and I am planning to
implement them later. Let's discuss, what behavior we are waiting for
when atom seems useless for the command. Die or ignore? And, which
atoms are useless (as I understand, "rest" and "deltabase" from
cat-file are useless for all ref-filter users, so the question is
about - am I right in it, and about ref-filter atoms for cat-file).

I have never written any tests and docs for Git, I will try to explore
by myself how to do that, but if you have any special links/materials
about it - please send them to me :)

Olga


Re: [PATCH v2 01/18] cat-file: split expand_atom into 2 functions

2018-01-17 Thread Оля Тележная
2018-01-18 2:04 GMT+03:00 Christian Couder :
> On Wed, Jan 17, 2018 at 10:49 PM, Jeff King  wrote:
>> On Tue, Jan 16, 2018 at 10:22:23AM +0300, Оля Тележная wrote:
>>
>>> >> In other words, I think the endgame is that expand_atom() isn't there at
>>> >> all, and we're calling the equivalent of format_ref_item() for each
>>> >> object (except that in a unified formatting world, it probably doesn't
>>> >> have the word "ref" in it, since that's just one of the items a caller
>>> >> might pass in).
>>>
>>> Agree! I want to merge current edits, then create format.h file and
>>> make some renames, then finish migrating process to new format.h and
>>> support all new meaningful tags.
>>
>> I think we have a little bit of chicken and egg there, though. I'm
>> having trouble reviewing the current work, because it's hard to evaluate
>> whether it's doing the right thing without seeing the end state.
>
> Yeah, to me it feels like you are at a middle point and there are many
> ways to go forward.

OK. Maybe I misunderstood you and Jeff in our call, I thought that was
your idea to make a merge now, sorry. I will continue my work here.

>
> As I wrote in another email though, I think it might be a good time to
> consolidate new functionality by adding tests (and perhaps
> documentation at the same time) for each new atom that is added to
> ref-filter or cat-file. It will help you refactor the code and your
> patch series later without breaking the new functionality.
>
>> So what
>> I was suggesting in my earlier mails was that we actually _not_ try to
>> merge this series, but use its components and ideas to build a new
>> series that does things in a bit different order.
>
> Yeah, I think you will have to do that, but the tests that you can add
> now for the new features will help you when you will build the new
> series.
>
> And hopefully it will not be too much work to create this new series
> as you will perhaps be able to just use the interactive rebase to
> build it.
>
> I also don't think it's a big problem if the current patch series gets
> quite long before you start creating a new series.