Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-07 Thread Johannes Schindelin
Hi Junio,

On Tue, 6 Mar 2018, Junio C Hamano wrote:

> Phillip Wood  writes:
> 
> > I wonder if just having a predicable result rather than forcing the
> > rebase to stop if the user just squashes a fixup commit into a topic
> > branch that is the parent of a merge might be more convenient in
> > practice.
> 
> Unless I am misunderstanding what you are saying, that is pretty much
> what I have automated for my daily rebuild of the 'pu' branch
> 
> Non-textual semantic conflicts are made (in the best case just once)
> as a separate commit on top of mechanical auto-merge whose focus is
> predictability (rather than cleverness) done by Git, and then that
> separate commit is kept outside the history.  When replaying these
> merges to rebuild the 'pu' branch, after resetting the tip to
> 'master', each topic is merged mechanically, and if such a fix-up
> commit is present, "cherry-pick --no-commit" applies it and then
> "commit --amend --no-edit" to adjust the merge.  I find it quite
> valuable to have a separate record of what "evil" non-mechanical
> adjustment was done, which I know won't be lost in the noise when
> these merges need to be redone daily or more often.

So essentially, you have something that `git rerere` would have learned,
but as a commit?

Might make sense to make that procedure more accessible to others, too ;-)
I know that *I* would use it.

Ciao,
Dscho


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-07 Thread Johannes Schindelin
Hi Junio,

On Wed, 7 Mar 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> OK, does this mean we want to wait before merging the "recreate
> >> merge" topic down to 'next'?  For more than a few weeks, it has been
> >> slated for 'next'.
> >
> > Maybe a few more days.
> > ...
> > I want to discuss this in the other subthread, though.
> 
> If we are talking about a drastic change, a few more days may not be
> sufficient, but we are not in a hurry, as this already sounds like a
> 2.18 material anyway.

It is not at all a drastic change. It will actually make the current patch
series better (simplifying the "can we fast-forward?" check).

I just want to make sure that I already have Phillip's strategy working,
but it will be yet another topic branch on top of the topic branch that
will add support for octopus merges *after* the current --recreate-merges
topic branch ;-)

> As you made it clear that it is OK not to merge the current one for now,
> my objective of asking the question is already satisfied ;-)

Depending how much GitMerge will occupy my time, I hope to have something
polished by tomorrow.

Ciao,
Dscho


Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-07 Thread Johannes Schindelin
Hi Sergey,

On Wed, 7 Mar 2018, Sergey Organov wrote:

> Johannes Schindelin  writes:
> >
> > On Wed, 7 Mar 2018, Sergey Organov wrote:
> >
> >> Johannes Schindelin  writes:
> >> 
> >> > On Tue, 6 Mar 2018, Sergey Organov wrote:
> >> >
> >> >> This is v2 of my "Rebasing merges" proposal.
> >> >
> >> > Didn't we settle on Phillip's "perform successive three-way merges
> >> > between the original merge commit and the new tips with the old
> >> > tips as base" strategy?
> >> 
> >> It seems you did, dunno exactly why.
> >
> > That is not true. You make it sound like I was the only one who liked
> > this, and not Phillip and Buga, too.
> >
> > Are you interested in the best solution, or in your solution :-)
> 
> I'm interested in any that works, and only you say that those suggested
> by Phillip is somehow superior. I still believe it's mine that superior,
> even if slightly.

That is misrepresenting what happened.

First, you came up with a strategy. I pointed out shortcomings that
implied that we cannot use it unchanged. Then, Buga fixed your strategy by
using additional steps (making the process more complicated than before,
still without a simple-enough explanation for my liking, fixing the
shortcomings). Then, Phillip presented a super-simple strategy and Buga
confirmed that it also fixes the shortcomings I pointed out.

I am very excited that we finally found something that works *and* is easy
to reason about.

Let's focus on that strategy rather than going back to the strategy which
has known flaws and only an unsatisfyingly complex explanation.

> >> The main problem with this decision is that we still don't see how
> >> and when to stop for user amendment using this method. OTOH, the
> >> original has this issue carefully discussed.
> >
> > Why would we want to stop, unless there are merge conflicts?
> 
> There is somewhat lengthy discussion about it that you probably missed.
> Not to repeat it, just see how 'rerere' works when it fires during
> rebase, even with no conflicts.

I did not miss that discussion. My question was a follow-up: Why would we
want to stop, unless there are merge conflicts?

> >> > It has the following advantages:
> >> >
> >> > - it is *very simple* to describe
> >> 
> >> The original is as simple if not simpler:
> >> 
> >> "rebase sides of the merge commit and then three-way merge them back
> >> using original merge commit as base"
> >
> > And that is also wrong, as I had proved already! Only Buga's addition made
> > it robust against dropping/modifying commits, and that addition also makes
> > it more complicated.
> 
> No. Get your facts straight. The [RFC v2] already fixed that original
> mistake. Could you please finally read it?

I do not see Buga's additions, and it is still a lengthy document that is
hard to understand.

Phillip's alternative, in contrast, fit in at most two 80x25 pages and was
intuitive (at least after seeing that the tree resulting from a merge is
identical to the tree resulting from a rebase, once all merge conflicts
are handled appropriately).

> > And it still has no satisfactory simple explanation why it works.
> 
> It has. It's there in the [RFC v2]. You seem to be the only one who
> doesn't get it. I suppose you just didn't bother to read.

I tried to read it, and got lost in all those figures that really do not
do anything to make this strategy obvious to me.

Granted, I now know *how* it works.

I gave up understanding *why* it is supposed to work.

With Phillip's mail, it only took me 5 minutes to get a rudimentary
understanding why it works.

> >> No problems with octopuses, and no virtual merge bases of recursive
> >> merges to reason about.
> >
> > But those are no problems for Phillip's strategy, either!
> 
> I thought it was you who started to discuss virtual merge bases and
> related problems, as well as how it's difficult to support octopus
> merges, but it's fine with me if there are none of these problems.

Yes, I started explaining virtual merge bases, and how they are used in
the recursive merge. Because I was asked how the recursive merge works,
and I happen to know how it works very intimately.

> > So your point is...?
> 
> Still the same -- use what's better, the [RFC v2].

I strongly disagree that your approach is superior. It is more complex,
and still has no simple answer to the question "why is this supposed to do
what I want it to do?"

It does a lot of criss-crossing, and when I see that, I immediately
suspect that it would lose information e.g. when commits were amended
during the rebase. There are just way too many paths for obsolete changes
to creep in again.

> >> > - it is *very easy* to reason about, once it is pointed out that
> >> > rebases and merges result in the same trees.
> >> 
> >> The original is as easy to reason about, if not easier, especially as
> >> recursive merge strategy is not being used there in new ways.
> >
> > So do it. I still have to hear 

Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-07 Thread Johannes Schindelin
Hi Sergey,

On Wed, 7 Mar 2018, Sergey Organov wrote:

> Johannes Schindelin  writes:
> 
> > How can your approach -- which relies *very much* on having the
> > original parent commits -- not *require* that consistency check?
> 
> I don't understand what you mean, sorry. Could you please point me to
> the *require* you talk about in the original proposal?

Imagine a todo list that contains this line

merge -C abcdef 123456

and now the user edits it (this is an interactive rebase, after all),
adding another merge head:

merge -C abcdef 987654 123456

Now your strategy would have a serious problem: to find the original
version of 987654. If there was one.

> > What would your approach (that still has no satisfyingly trivial
> > explanation, in my mind)
> 
> Here is one-liner: rebase sides of the merge commit and then 3-way
> merge them, using original merge commit as merge base.

But I already pointed out how that would undo a commit having been
dropped.

> > do if somebody edited a `merge` command and let it merge a completely
> > unrelated commit?
> 
> Don't see a problem, sorry. The method should still work, provided you have
> original merge commit and two new parents for the new merge.

That is assuming a lot. That is exactly what this consistency check is
for, that I mentioned earlier, and which you listed as a downside of
Phillip's strategy (forgetting that your strategy has the same downside,
so...).

But I guess that you are still talking about the non-interactive version
of the rebase, and missed that our conversation proceeded to the point
where we want that same strategy to work *also* in the interactive version
(and not have a completely different functionality depending whether you
use --interactive or not)?

Ciao,
Johannes


Hello

2018-03-07 Thread Bauer Mcdonalds
Good day dear, i hope this mail meets you well? I know this may seem 
inappropriate so 
i ask for your forgiveness but i wish to get to know you better, if I may be so 
bold. 
I consider myself an easy-going man, adventurous, honest and fun loving person 
but I 
am currently looking for a relationship in which I will feel loved. I promise 
to 
answer any question that you may want to ask me...all i need is just your 
attention 
and the chance to know you more.

Please tell me more about yourself, if you do not mind. Hope to hear back from 
you 
soon.

Bauer


I got a forced update after I run git pull --rebase

2018-03-07 Thread ZhenTian
I can't reproduct my issue, this is my first time, but my colleague
came across this issue several weeks ago.

After I pushed my commit to git server without rejection. I run git
pull --rebase, then I got a forced update, and my last commit is
missing.

I have asked a question on StackOverflow, there is more details about
this issue: 
https://stackoverflow.com/questions/49164906/why-git-pull-rebase-results-a-forced-update

I have no idea about this.

I'm using Ubuntu 16.04.4 with Git 2.16.2, PyCharm IDE.


Sincerely,
Tian Zhen


Re: [PATCH v10 3/9] strbuf: add a case insensitive starts_with()

2018-03-07 Thread Duy Nguyen
On Thu, Mar 8, 2018 at 12:30 AM,   wrote:
> From: Lars Schneider 
>
> Check in a case insensitive manner if one string is a prefix of another
> string.
>
> This function is used in a subsequent commit.
>
> Signed-off-by: Lars Schneider 
> ---
>  git-compat-util.h | 1 +
>  strbuf.c  | 9 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 68b2ad531e..f648da0c11 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -455,6 +455,7 @@ extern void (*get_warn_routine(void))(const char *warn, 
> va_list params);
>  extern void set_die_is_recursing_routine(int (*routine)(void));
>
>  extern int starts_with(const char *str, const char *prefix);
> +extern int startscase_with(const char *str, const char *prefix);

This name is a bit hard to read. Boost [1] goes with istarts_with. I
wonder if it's better. If not I guess either starts_with_case or
starts_case_with will improve readability.

[1] 
http://www.boost.org/doc/libs/1_41_0/doc/html/boost/algorithm/istarts_with.html

>
>  /*
>   * If the string "str" begins with the string found in "prefix", return 1.
> diff --git a/strbuf.c b/strbuf.c
> index b635f0bdc4..5779a2d591 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -11,6 +11,15 @@ int starts_with(const char *str, const char *prefix)
> return 0;
>  }
>
> +int startscase_with(const char *str, const char *prefix)
> +{
> +   for (; ; str++, prefix++)
> +   if (!*prefix)
> +   return 1;
> +   else if (tolower(*str) != tolower(*prefix))
> +   return 0;
> +}
> +
>  int skip_to_optional_arg_default(const char *str, const char *prefix,
>  const char **arg, const char *def)
>  {
> --
> 2.16.2
>



-- 
Duy


Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Junio C Hamano
Lars Schneider  writes:

> I would like to advise the dashed form as this seems to be the
> canonical form and it avoids cross platform issues. My macOS
> iconv does not support the form without dashes.

Sure, that is why I said canonicalization without inserting dash
does not make much sense, hence an interim step with only upcasing
is not a good idea.  A possible interim solution would be to do
nothing (no dash insertion, no upcasing) and fixing both in a later
follow-up patch, but as I said, I do not care too strongly either
way.

> Would this approach work for you?
>
>   const char *advise_msg = _(
>   "The file '%s' contains a byte order "
>   "mark (BOM). Please use UTF-%s as "
>   "working-tree-encoding.");
>   const char *stripped;
>   char *upper = xstrdup_toupper(enc);
>   upper[strlen(upper)-2] = '\0';
>   skip_prefix(upper, "UTF-", ) ||
>   skip_prefix(stripped, "UTF", );
>   advise(advise_msg, path, stripped);

Are you now interested in not having any interim step and jump
directly to the endgame solution?  If so, that is fine by me, too,
but as I already said earlier (i.e. not doing this BOM check for an
encoding that is not spelled in your canonical upcase-with-dash form
might be a feature that leaves an escape hatch), I am not all that
interested in enforcing policy at this point in the codepath to
begin with, so...



Re: git help clone: questions

2018-03-07 Thread kalle


Am 07.03.2018 um 23:45 schrieb Junio C Hamano:
> kalle  writes:
> 
>> Am 06.03.2018 um 02:36 schrieb Junio C Hamano:
>>> kalle  writes:
>>>
 -In the explanation of the option --reference: shouldn't there be
 written '' instead of  'reference repository'?
>>>
>>> "Shouldn't X be Y?" is not an effective way to communicate; it
>>> solicits a "no, the current one is fine." without any explanation.
>>>
>>> If you think X should be Y for some reason, please say "I think X
>>> should be Y BECAUSE Z" instead.  Without stating why you think
>>> differently from what those who wrote the current text, it is hard
>>> for people to respond either with "Yeah, you're right---I agree
>>> with Z" or with "No, Z does not hold because..."
>>>
>> I wrote this, because when it is written about 'reference repository', I
>> consider it not totally clear, which repository is meant, as the option
>> '--reference ' only names one as .
>> For reasons of clearness, I now propose writing "reference repository
>> ".
> 
> I do not have particularly a strong opinion, but I think it is very
> sensible to call the value given to the option "--reference" with a
> phrase that is not just "repository".
i agree and didn't state this. i proposed to add .
 could also be named .
> 
> As the command line of "clone" must name one repository (i.e. the
> one which we clone from), and its "--reference" option must name
> another repository as its value (i.e. the one that we borrow from in
> order to reduce the object transfer), calling both 
> makes it easier to confuse readers

you made my point

 unless the writer carefully makes
> sure that  in the desription is unambiguous and it is
> clear which one of these two repositories is being discussed by the
> context.>
> I just re-read the existing Documentation/git-clone.txt and looked
> for "reference".  All uses of "reference repository" in the prose
> made sense and I found it would not be an improvement if any of them
> is replaced with just "repository". 

This was never my proposal, though.

 It may be helpful to add
> something like:
> 
>--reference[-if-able] ::
>   +   Define a repository (reference repository) to borrow
>   +   objects from.  
>   If the reference repository is on the local machine,
>   ...
> 
> to define which repository we mean by that term, though.
> 

in all, it was just meant as a quite small proposal for me. i also don't
have any strong opinion about it.
greetings,
kalle


Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Lars Schneider

> On 07 Mar 2018, at 23:57, Junio C Hamano  wrote:
> 
> Lars Schneider  writes:
> 
>> At this point I thought it would make sense to make the advised
>> encoding name uppercase in both situations. OK with you?
> 
> In the endgame, if upcased and properly dashed form is always used,
> that would be good (if we are enforcing the policy, which I am not
> onboard 100% but it's your code and I do not care too strongly about
> it).  I do not see much point in an interim step that only upcases
> without doing the dash insertion.

I would like to advise the dashed form as this seems to be the
canonical form and it avoids cross platform issues. My macOS
iconv does not support the form without dashes.

Would this approach work for you?

const char *advise_msg = _(
"The file '%s' contains a byte order "
"mark (BOM). Please use UTF-%s as "
"working-tree-encoding.");
const char *stripped;
char *upper = xstrdup_toupper(enc);
upper[strlen(upper)-2] = '\0';
skip_prefix(upper, "UTF-", ) ||
skip_prefix(stripped, "UTF", );
advise(advise_msg, path, stripped);

Thanks,
Lars


Re: [PATCH v10 9/9] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-07 Thread Lars Schneider

> On 07 Mar 2018, at 23:52, Junio C Hamano  wrote:
> 
> Lars Schneider  writes:
> 
>> I don't think HT makes too much sense. However, isspace() is nice
>> and I will use it. Being more permissive on the inputs should hurt.
> 
> You are being incoherent in these three sentences.  If you want to
> be more strict and only allow SP, then isspace() is already too
> broad, as it does allow HT (and even CR and LF).

I meant "Being more permissive on the inputs shouldN'T hurt." :-)


> I do think HT makes just as much sense as SP, though, so if you use
> isspace(), that would be perfectly fine.

OK!

- Lars



Re: [PATCH v10 6/9] convert: add 'working-tree-encoding' attribute

2018-03-07 Thread Junio C Hamano
Lars Schneider  writes:

> Nice catch. What do you think about this solution using is_encoding_utf8()
> from utf.c?

That helper was invented exactly for a case like this, I would
think.


Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Junio C Hamano
Lars Schneider  writes:

> In the case of has_prohibited_utf_bom() you are right as we are 
> dropping the BE/LE suffix in the advise. However, look at the 
> is_missing_required_utf_bom() advise. Here we *add* BE/LE.

So?  Then add BE/LE like "Utf-16BE" or "utf16BE".  You do not have
enough information to decide what is best for the user anyway (which
is why I am not convinced that it is even a good idea to always
upcase in the first place).

> At this point I thought it would make sense to make the advised
> encoding name uppercase in both situations. OK with you?

In the endgame, if upcased and properly dashed form is always used,
that would be good (if we are enforcing the policy, which I am not
onboard 100% but it's your code and I do not care too strongly about
it).  I do not see much point in an interim step that only upcases
without doing the dash insertion.




Re: [PATCH v10 6/9] convert: add 'working-tree-encoding' attribute

2018-03-07 Thread Lars Schneider

> On 07 Mar 2018, at 18:54, Eric Sunshine  wrote:
> 
> On Wed, Mar 7, 2018 at 12:30 PM,   wrote:
>> [...]
>> Add an attribute to tell Git what encoding the user has defined for a
>> given file. If the content is added to the index, then Git converts the
>> content to a canonical UTF-8 representation. On checkout Git will
>> reverse the conversion.
>> 
>> Signed-off-by: Lars Schneider 
>> ---
>> Documentation/gitattributes.txt  |  80 +++
>> diff --git a/convert.c b/convert.c
>> @@ -265,6 +266,78 @@ static int will_convert_lf_to_crlf(size_t len, struct 
>> text_stat *stats,
>> +static const char *default_encoding = "UTF-8";
>> @@ -978,6 +1051,21 @@ static int ident_to_worktree(const char *path, const 
>> char *src, size_t len,
>> +static const char *git_path_check_encoding(struct attr_check_item *check)
>> +{
>> +   const char *value = check->value;
>> +
>> +   if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value) ||
>> +   !strlen(value))
>> +   return NULL;
>> +
>> +   /* Don't encode to the default encoding */
>> +   if (!strcasecmp(value, default_encoding))
>> +   return NULL;
> 
> As of v10, the rest of the code accepts encoding names "UTF-xx" and
> "UTFxx" (case insensitive), however, this check recognizes only
> "UTF-8" (case insensitive). For consistency, one would expect this
> also to recognize "UTF8" (case insensitive).

Nice catch. What do you think about this solution using is_encoding_utf8()
from utf.c?

if (is_encoding_utf8(value) && is_encoding_utf8(default_encoding)) 

- Lars

Re: [PATCH v10 9/9] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-07 Thread Junio C Hamano
Lars Schneider  writes:

> Although the line is unnecessary, I felt it is safer/easier to 
> understand and maintain. Since both of you tripped over it, I will
> remove it though.

I didn't actually trip over it.  It made it look as if the coder
didn't understand what the code is doing to have that extra line.

> I don't think HT makes too much sense. However, isspace() is nice
> and I will use it. Being more permissive on the inputs should hurt.

You are being incoherent in these three sentences.  If you want to
be more strict and only allow SP, then isspace() is already too
broad, as it does allow HT (and even CR and LF).

I do think HT makes just as much sense as SP, though, so if you use
isspace(), that would be perfectly fine.


Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Lars Schneider

> On 07 Mar 2018, at 23:32, Junio C Hamano  wrote:
> 
> Lars Schneider  writes:
> 
>> I also would have liked to advise "UTF-16" instead of "UTF16" as
>> you suggested. However, that required a few more lines and I wanted
>> to keep the change to a minimum. I feel this could be added in a
>> follow up patch.
> 
> I'd say the whole upcase thing belongs to such a follow-up patch if
> that is the case.
> 
>>> On the other hand, if we are not enforcing such a policy decision
>>> but merely explaining a way to work around this check, then it may
>>> be better to give a variant with the smaller difference from the
>>> original (i.e. without up-casing).
>> 
>> See example mentioned above: "Utf-16". How would you handle that?
> 
> Dropping LE suffix from "Utf-16LE" or "Utf16LE" would yield "Utf-16"
> or "Utf16" if the advise message does not force policy, or "UTF-16"
> in the canoical form if it does.  Is there a problem?

In the case of has_prohibited_utf_bom() you are right as we are 
dropping the BE/LE suffix in the advise. However, look at the 
is_missing_required_utf_bom() advise. Here we *add* BE/LE.

At this point I thought it would make sense to make the advised
encoding name uppercase in both situations. OK with you?

- Lars






Re: git help clone: questions

2018-03-07 Thread Junio C Hamano
kalle  writes:

> Am 06.03.2018 um 02:36 schrieb Junio C Hamano:
>> kalle  writes:
>> 
>>> -In the explanation of the option --reference: shouldn't there be
>>> written '' instead of  'reference repository'?
>> 
>> "Shouldn't X be Y?" is not an effective way to communicate; it
>> solicits a "no, the current one is fine." without any explanation.
>> 
>> If you think X should be Y for some reason, please say "I think X
>> should be Y BECAUSE Z" instead.  Without stating why you think
>> differently from what those who wrote the current text, it is hard
>> for people to respond either with "Yeah, you're right---I agree
>> with Z" or with "No, Z does not hold because..."
>> 
> I wrote this, because when it is written about 'reference repository', I
> consider it not totally clear, which repository is meant, as the option
> '--reference ' only names one as .
> For reasons of clearness, I now propose writing "reference repository
> ".

I do not have particularly a strong opinion, but I think it is very
sensible to call the value given to the option "--reference" with a
phrase that is not just "repository".

As the command line of "clone" must name one repository (i.e. the
one which we clone from), and its "--reference" option must name
another repository as its value (i.e. the one that we borrow from in
order to reduce the object transfer), calling both 
makes it easier to confuse readers unless the writer carefully makes
sure that  in the desription is unambiguous and it is
clear which one of these two repositories is being discussed by the
context.

I just re-read the existing Documentation/git-clone.txt and looked
for "reference".  All uses of "reference repository" in the prose
made sense and I found it would not be an improvement if any of them
is replaced with just "repository".  It may be helpful to add
something like:

 --reference[-if-able] ::
+   Define a repository (reference repository) to borrow
+   objects from.  
If the reference repository is on the local machine,
...

to define which repository we mean by that term, though.


Re: [PATCH v10 9/9] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-07 Thread Lars Schneider

> On 07 Mar 2018, at 20:59, Junio C Hamano  wrote:
> 
> lars.schnei...@autodesk.com writes:
> 
>> +static int check_roundtrip(const char* enc_name)
> 
> The asterisk sticks to the variable, not type.

Argh. I need to put this check into Travis CI ;-)


>> +{
>> +/*
>> + * check_roundtrip_encoding contains a string of space and/or
>> + * comma separated encodings (eg. "UTF-16, ASCII, CP1125").
>> + * Search for the given encoding in that string.
>> + */
>> +const char *found = strcasestr(check_roundtrip_encoding, enc_name);
>> +const char *next;
>> +int len;
>> +if (!found)
>> +return 0;
>> +next = found + strlen(enc_name);
>> +len = strlen(check_roundtrip_encoding);
>> +return (found && (
>> +/*
>> + * check that the found encoding is at the
>> + * beginning of check_roundtrip_encoding or
>> + * that it is prefixed with a space or comma
>> + */
>> +found == check_roundtrip_encoding || (
>> +found > check_roundtrip_encoding &&
>> +(*(found-1) == ' ' || *(found-1) == ',')
>> +)
> 
> The second line is unneeded, as we know a non-NULL found either
> points at check_roundtrip_encoding or somewhere to the right, and
> the first test already checked the "points exactly at" case.

Eric objected that too [1], but noted the documentation value:

Although the 'found > check_roundtrip_encoding' expression is
effectively dead code in this case, it does document that the
programmer didn't just blindly use '*(found-1)' without taking
boundary conditions into consideration.

(It's dead code because, at this point, we know that strcasestr()
didn't return NULL and we know that 'found' doesn't equal
'check_roundtrip_encoding', therefore it _must_ be greater than
'check_roundtrip_encoding'.)

[1] 
https://public-inbox.org/git/CAPig+cR81J3fTOtrgAumAs=rc5hqyffsmeb-ru-yf_ahfub...@mail.gmail.com/

Although the line is unnecessary, I felt it is safer/easier to 
understand and maintain. Since both of you tripped over it, I will
remove it though.


> This is defined to be a comma separated list, so it is unnecessary
> to accept  == <"FOO, SHIFT-JIS, BAR", "SHIFT-JIS">; if you
> allow SP, perhaps "isspace(found[-1]) || found[-1] == ','" to also
> allow HT may also be appropriate.

I don't think HT makes too much sense. However, isspace() is nice
and I will use it. Being more permissive on the inputs should hurt.


>  I think "comma or whitespace
> separated list" is fine; in any case, the comment near the beginning
> of this function does not match new text in Documentation/config.txt
> added by this patch.

Nice catch. Will fix.


Thanks,
Lars



Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Junio C Hamano
Lars Schneider  writes:

> I also would have liked to advise "UTF-16" instead of "UTF16" as
> you suggested. However, that required a few more lines and I wanted
> to keep the change to a minimum. I feel this could be added in a
> follow up patch.

I'd say the whole upcase thing belongs to such a follow-up patch if
that is the case.

>> On the other hand, if we are not enforcing such a policy decision
>> but merely explaining a way to work around this check, then it may
>> be better to give a variant with the smaller difference from the
>> original (i.e. without up-casing).
>
> See example mentioned above: "Utf-16". How would you handle that?

Dropping LE suffix from "Utf-16LE" or "Utf16LE" would yield "Utf-16"
or "Utf16" if the advise message does not force policy, or "UTF-16"
in the canoical form if it does.  Is there a problem?


Re: git help clone: questions

2018-03-07 Thread kalle
I wrote this, because when it is written about 'reference repository', I
consider it not totally clear, which repository is meant, as the option
'--reference ' only names one as .
For reasons of clearness, I now propose writing "reference repository
".

kalle

Am 06.03.2018 um 02:36 schrieb Junio C Hamano:
> kalle  writes:
> 
>> -In the explanation of the option --reference: shouldn't there be
>> written '' instead of  'reference repository'?
> 
> "Shouldn't X be Y?" is not an effective way to communicate; it
> solicits a "no, the current one is fine." without any explanation.
> 
> If you think X should be Y for some reason, please say "I think X
> should be Y BECAUSE Z" instead.  Without stating why you think
> differently from what those who wrote the current text, it is hard
> for people to respond either with "Yeah, you're right---I agree
> with Z" or with "No, Z does not hold because..."
> 


Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Lars Schneider

> On 07 Mar 2018, at 20:49, Junio C Hamano  wrote:
> 
> lars.schnei...@autodesk.com writes:
> 
>> +static int validate_encoding(const char *path, const char *enc,
>> +  const char *data, size_t len, int die_on_error)
>> +{
>> +/* We only check for UTF here as UTF?? can be an alias for UTF-?? */
>> +if (startscase_with(enc, "UTF")) {
>> +/*
>> + * Check for detectable errors in UTF encodings
>> + */
>> +if (has_prohibited_utf_bom(enc, data, len)) {
>> +const char *error_msg = _(
>> +"BOM is prohibited in '%s' if encoded as %s");
>> +/*
>> + * This advice is shown for UTF-??BE and UTF-??LE 
>> encodings.
>> + * We cut off the last two characters of the encoding 
>> name
>> + # to generate the encoding name suitable for BOMs.
>> + */
> 
> Yuck.  The code pretends to abstract away the details in a helper
> has_prohibited_x() yet the caller still knows quite a lot.

True, but has_prohibited_x() cannot create a proper error/advise
message unless we give it more parameters (e.g. path name).
Therefore, I don't see a better way right now.


>> +const char *advise_msg = _(
>> +"The file '%s' contains a byte order "
>> +"mark (BOM). Please use %s as "
>> +"working-tree-encoding.");
>> +char *upper_enc = xstrdup_toupper(enc);
>> +upper_enc[strlen(upper_enc)-2] = '\0';
>> +advise(advise_msg, path, upper_enc);
>> +free(upper_enc);
> 
> I think this up-casing is more problematic than without, not from
> the point of view of the internal code, but from the point of view
> of the end user experience.  When the user writes utf16le or
> utf-16le and the data does not trigger the BOM check, we are likely
> to successfully convert it.  I do not see the merit of suggesting
> UTF16 or UTF-16 in such a case, over telling them to just drop the
> byte-order suffix from the encoding names (i.e. utf16 or utf-16).
> 
> If you are trying to force/nudge people in the direction of
> canonical way of spelling things (which may not be a bad idea), then
> "utf16le" as the original input would want to result in "UTF-16"
> with dash in the advise, no?

Correct. In the error messages I kept the encoding name "as-is" and
only in the advise message I used the uppercase variant to steer
people into the canonical direction. My initial reason for this was
that in is_missing_required_utf_bom() we add "BE/LE" to the encoding
in the advise message. Let's say the user used "Utf-16" as encoding.
 Should "BE/LE" be upper case or lower case? To avoid that question 
I made it always upper case.

I also would have liked to advise "UTF-16" instead of "UTF16" as
you suggested. However, that required a few more lines and I wanted
to keep the change to a minimum. I feel this could be added in a
follow up patch.


> On the other hand, if we are not enforcing such a policy decision
> but merely explaining a way to work around this check, then it may
> be better to give a variant with the smaller difference from the
> original (i.e. without up-casing).

See example mentioned above: "Utf-16". How would you handle that?


Thanks,
Lars



Re: man gittutorial: patch proposal

2018-03-07 Thread kalle
hello.
1.I understood, that verbose commands are quoted in ` ' or ' '. While
the points I changed are not verbose commands. Therefore I didn't see
any sense in them. Because of the different quoting ways (e.g. `merge`),
I thought that someone didn't understand the quoting concept. But it
could also be me...

2.yes please. I will read this section.

kalle

Am 06.03.2018 um 00:25 schrieb Jonathan Nieder:
> Hi,
> 
> kalle wrote:
> 
>> see attachment.
> 
> Thanks for writing.  A few questions:
> 
>  1. Can you say a little more about the rationale for this change?
> E.g. is the current formatting sending a confusing message?  Is the
> current formatting intending to use '' as quotes and using italics
> instead?  If so, should this use "" to fix it?
> 
>  2. May we forge your sign-off? See the section "Certify your work"
> in Documentation/SubmittingPatches for more about what this means.
> 
> Thanks,
> Jonathan
> 


[GSoC] Questions about "convert interactive rebase to C"

2018-03-07 Thread Alban Gruin
Hi,

I was reading the email related to the "convert interactive rebase to C"
idea[1], and I have a few questions about it:

> So the first goal would be to retire git-rebase--interactive.sh. For that
> to happen, --root needs to be supported first.

Combining rebase -i and --root seems to work fine, am I mistaking? Is
there still work to be done here?


> And then the
> command-line option parsing needs to be moved to rebase--helper, too.

rebase--helper was originally made for rebase--interactive, and it is
still the only script that uses it, according to git-grep.  If someone
rewrites rebase--interactive in C, why parse the command line in
rebase--helper instead of retiring it? Is there other plans that I missed?

Regards, Alban.


[1]
https://public-inbox.org/git/alpine.DEB.2.20.1609021432070.129229@virtualbox/


Re: [PATCH] git.el: handle default excludesfile properly

2018-03-07 Thread Dorab Patel
Thanks for updating us with the status of git.el.

The last time I looked at magit, it was too heavyweight for my needs.
I like the simplicity of git.el. But perhaps it's time for me to take
another look at magit.


On Tue, Mar 6, 2018 at 3:54 AM, Alexandre Julliard  wrote:
> Junio C Hamano  writes:
>
>> Having said that, I am sorry to say that I am not sure if the copy
>> we have is the one to be patched, so I would appreciate if Alexandre
>> (cc'ed) can clarify the situation.  The last change done to our copy
>> of the script is from 2012, and I do not know if Alexandre is still
>> taking care of it here but the script is so perfect that there was
>> no need to update it for the past 5 years and we haven't seen an
>> update, or the caninical copy is now being maintained elsewhere and
>> we only have a stale copy, or what.
>
> This is the canonical version, and I guess in theory I'm still taking
> care of it. However, the need that git.el was originally addressing is
> now fulfilled by better tools. As such, I feel that it has outlived its
> usefulness, and I'm no longer actively developing it.
>
> I'd recommend that anybody still using it switch to Magit, which is
> being actively maintained, and IMO superior to git.el in all respects.
>
> --
> Alexandre Julliard
> julli...@winehq.org


Re: [PATCH] Support long format for log-based submodule diff

2018-03-07 Thread Junio C Hamano
Robert Dailey  writes:

> I could have gone through the effort to make this more configurable, but
> before doing that level of work I wanted to get some discussion going to
> understand first if this is a useful change and second how it should be
> configured. For example, we could allow:
>
> $ git diff --submodule=long-log
>
> Or a supplementary option such as:
>
> $ git diff --submodule=log --submodule-log-detail=(long|short)
>
> I'm not sure what makes sense here. I welcome thoughts/discussion and
> will provide follow-up patches.

My quick looking around reveals that prepare_submodule_summary() is
called only by show_submodule_summary(), which in turn is called
only from builtin_diff() in a codepath like this:

if (o->submodule_format == DIFF_SUBMODULE_LOG &&
(!one->mode || S_ISGITLINK(one->mode)) &&
(!two->mode || S_ISGITLINK(two->mode))) {
show_submodule_summary(o, one->path ? one->path : two->path,
>oid, >oid,
two->dirty_submodule);
return;
} else if (o->submodule_format == DIFF_SUBMODULE_INLINE_DIFF &&
   (!one->mode || S_ISGITLINK(one->mode)) &&
   (!two->mode || S_ISGITLINK(two->mode))) {
show_submodule_inline_diff(o, one->path ? one->path : two->path,
>oid, >oid,
two->dirty_submodule);
return;
}

It looks like introducing a new value to o->submodule_format (enum
diff_submodule_format defined in diff.h) would be one natural way to
extend this codepath, at least to me from a quick glance.

It also looks to me that the above may become far easier to read if
the common "are we dealing with a filepair  that involves
submodules?" check in the above if/else if cascade is factored out,
perhaps like this as a preliminary clean-up step, before adding a
new value:

if ((!one->mode || S_ISGITLINK(one->mode)) &&
(!two->mode || S_ISGITLINK(two->mode))) {
switch (o->submodule_format) {
case DIFF_SUBMODULE_LOG:
... do the "log" thing ...
return;
case DIFF_SUBMODULE_INLINE_DIFF:
... do the "inline" thing ...
return;
default:
break;
}
}

Then the place to add a new format would be trivially obvious,
i.e. just add a new case arm to call a new function to give the
summary.


[PATCH] Support long format for log-based submodule diff

2018-03-07 Thread Robert Dailey
I am experimenting with a version of submodule diff (using log style)
that prints the commits brought in from merges, while excluding the
merge commits themselves. This is useful in cases where a merge commit's
summary does not fully explain the changes being merged (for example,
for longer-lived branches).

I could have gone through the effort to make this more configurable, but
before doing that level of work I wanted to get some discussion going to
understand first if this is a useful change and second how it should be
configured. For example, we could allow:

$ git diff --submodule=long-log

Or a supplementary option such as:

$ git diff --submodule=log --submodule-log-detail=(long|short)

I'm not sure what makes sense here. I welcome thoughts/discussion and
will provide follow-up patches.

Signed-off-by: Robert Dailey 
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index 2967704317..a0a62ad7bd 100644
--- a/submodule.c
+++ b/submodule.c
@@ -428,7 +428,8 @@ static int prepare_submodule_summary(struct rev_info *rev, 
const char *path,
init_revisions(rev, NULL);
setup_revisions(0, NULL, rev, NULL);
rev->left_right = 1;
-   rev->first_parent_only = 1;
+   rev->max_parents = 1;
+   rev->first_parent_only = 0;
left->object.flags |= SYMMETRIC_LEFT;
add_pending_object(rev, >object, path);
add_pending_object(rev, >object, path);
-- 
2.13.1.windows.2



Re: Bug report: git-stash doesn't return correct status code

2018-03-07 Thread Junio C Hamano
Junio C Hamano  writes:

> "Vromen, Tomer"  writes:
>
>>> git stash && git checkout -b new-feature-branch && git stash pop
>>
>> This is useful when I realize that I want to open a new branch for my 
>> changes (that I haven't committed yet).
>> However, I might have forgotten to save my changes in the editor, so 
>> git-stash will give this error:
>>
>> No local changes to save
>
> This is given with "say" and not with "die", as this is merely an
> informational diagnosis.  The command did not find any erroneous
> condition, the command did not fail to do anything it was supposed
> to do, so the command exited with 0 status.

I guess that is only half an answer.  If you really want to avoid
creating the new branch when the working tree and the index are
clean, you'd need to check that yourself before that three-command
sequence.  In our shell script, we use these as such a check:

git update-index --refresh -q --ignore-submodules
git diff-files --quiet --ignore-submodules &&
git diff-index --cached --quiet --ignore-submodules HEAD --

But stepping back a bit, why do you even need stash save/pop around
"checkout -b new-feature-branch" (that implicitly branches at the
tip) in the first place?  "checkout -b" that begins at the current
HEAD does not touch the index nor the working tree and take the
local changes with you to the new branch, so save/pop around it
seems totally pointless.


Re: [PATCH v10 9/9] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-07 Thread Junio C Hamano
lars.schnei...@autodesk.com writes:

> +static int check_roundtrip(const char* enc_name)

The asterisk sticks to the variable, not type.

> +{
> + /*
> +  * check_roundtrip_encoding contains a string of space and/or
> +  * comma separated encodings (eg. "UTF-16, ASCII, CP1125").
> +  * Search for the given encoding in that string.
> +  */
> + const char *found = strcasestr(check_roundtrip_encoding, enc_name);
> + const char *next;
> + int len;
> + if (!found)
> + return 0;
> + next = found + strlen(enc_name);
> + len = strlen(check_roundtrip_encoding);
> + return (found && (
> + /*
> +  * check that the found encoding is at the
> +  * beginning of check_roundtrip_encoding or
> +  * that it is prefixed with a space or comma
> +  */
> + found == check_roundtrip_encoding || (
> + found > check_roundtrip_encoding &&
> + (*(found-1) == ' ' || *(found-1) == ',')
> + )

The second line is unneeded, as we know a non-NULL found either
points at check_roundtrip_encoding or somewhere to the right, and
the first test already checked the "points exactly at" case.

This is defined to be a comma separated list, so it is unnecessary
to accept  == <"FOO, SHIFT-JIS, BAR", "SHIFT-JIS">; if you
allow SP, perhaps "isspace(found[-1]) || found[-1] == ','" to also
allow HT may also be appropriate.  I think "comma or whitespace
separated list" is fine; in any case, the comment near the beginning
of this function does not match new text in Documentation/config.txt
added by this patch.


Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Junio C Hamano
lars.schnei...@autodesk.com writes:

> +static int validate_encoding(const char *path, const char *enc,
> +   const char *data, size_t len, int die_on_error)
> +{
> + /* We only check for UTF here as UTF?? can be an alias for UTF-?? */
> + if (startscase_with(enc, "UTF")) {
> + /*
> +  * Check for detectable errors in UTF encodings
> +  */
> + if (has_prohibited_utf_bom(enc, data, len)) {
> + const char *error_msg = _(
> + "BOM is prohibited in '%s' if encoded as %s");
> + /*
> +  * This advice is shown for UTF-??BE and UTF-??LE 
> encodings.
> +  * We cut off the last two characters of the encoding 
> name
> +  # to generate the encoding name suitable for BOMs.
> +  */

Yuck.  The code pretends to abstract away the details in a helper
has_prohibited_x() yet the caller still knows quite a lot.

> + const char *advise_msg = _(
> + "The file '%s' contains a byte order "
> + "mark (BOM). Please use %s as "
> + "working-tree-encoding.");
> + char *upper_enc = xstrdup_toupper(enc);
> + upper_enc[strlen(upper_enc)-2] = '\0';
> + advise(advise_msg, path, upper_enc);
> + free(upper_enc);

I think this up-casing is more problematic than without, not from
the point of view of the internal code, but from the point of view
of the end user experience.  When the user writes utf16le or
utf-16le and the data does not trigger the BOM check, we are likely
to successfully convert it.  I do not see the merit of suggesting
UTF16 or UTF-16 in such a case, over telling them to just drop the
byte-order suffix from the encoding names (i.e. utf16 or utf-16).

If you are trying to force/nudge people in the direction of
canonical way of spelling things (which may not be a bad idea), then
"utf16le" as the original input would want to result in "UTF-16"
with dash in the advise, no?

On the other hand, if we are not enforcing such a policy decision
but merely explaining a way to work around this check, then it may
be better to give a variant with the smaller difference from the
original (i.e. without up-casing).


Re: [PATCH v10 6/9] convert: add 'working-tree-encoding' attribute

2018-03-07 Thread Junio C Hamano
lars.schnei...@autodesk.com writes:

> +static const char *git_path_check_encoding(struct attr_check_item *check)
> +{
> + const char *value = check->value;
> +
> + if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value) ||
> + !strlen(value))
> + return NULL;

This means that having "* working-tree-encoding" (without "=value")
in your .gitattributes file is silently ignored.  

Thinking aloud.  I wonder if that is something we would want to warn
about so that the project can fix it, perhaps?  Or would that become
too noisy to use an older version of Git that includes this series
when a newer version that defines new meanings to boolean
(set/unset) values for w-t-e attribute becomes available and
projects adopt it?  On the other hand, with this code as-is or with
an additional warning, such an "older version" of Git by definition
behaves differently from such a "newer version" that does something
different when the attribute is not a non-empty string, so it is
quite likely that we won't be able to redefine or extend the meaning
of w-t-e in a way like that.

Which means that the only sensible way to make sure we _could_ later
extend the meaning of w-t-e and make it behave differently when set
to a non-empty string is to make it an error in _this_ "older"
version.  That way, of course people cannot use the "older" version
on a newer project that depends on the way how "newer" Git treats
w-t-e that is set to, say, "true", but we won't silently (or loudly)
do wrong things to the project's data.



Re: Bug report: git-stash doesn't return correct status code

2018-03-07 Thread Junio C Hamano
"Vromen, Tomer"  writes:

>> git stash && git checkout -b new-feature-branch && git stash pop
>
> This is useful when I realize that I want to open a new branch for my changes 
> (that I haven't committed yet).
> However, I might have forgotten to save my changes in the editor, so 
> git-stash will give this error:
>
> No local changes to save

This is given with "say" and not with "die", as this is merely an
informational diagnosis.  The command did not find any erroneous
condition, the command did not fail to do anything it was supposed
to do, so the command exited with 0 status.



Re: [PATCH 2/2] completion: simplify _git_notes

2018-03-07 Thread Junio C Hamano
SZEDER Gábor  writes:

> That works fine, but this would work just as well and has one less
> case arm:

OK, I missed that "obvious optimization" opportunity.  I agree that
would work.

>
> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> index ab80f4e6e8..038af63c1a 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -1839,6 +1839,8 @@ _git_notes ()
>   *,--*)
>   __gitcomp_builtin notes_$subcommand
>   ;;
> + prune,*)
> + ;;
>   *)
>   case "$prev" in
>   -m|-F)


Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-03-07 Thread Junio C Hamano
Johannes Schindelin  writes:

> Junio, may I ask you to put this into a SQUASH??? commit so that the
> Windows build no longer fails?

Thanks for a reminder; I also spotted it (I first thought I screwed
up in my editor while reviewing and then went back to the original
on the list) and sent out a response, but then by that time I was
already far into the day's integration cycle.

Will queue a SQUASH??? at the tip.


Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-03-07 Thread Junio C Hamano
Duy Nguyen  writes:

>>> +Set environment variable `GIT_TRACE` in order to see the memory usage
>>> +estimation in `git gc --auto` that determines whether the base pack is
>>> +kept.
>>
>> This is somewhat a puzzling use of trace.  It sounds more like a way
>> to find out "how" memory usage estimation is done and arriving at a
>> wrong value for those who want to debug the feature.
>
> Yeah. I'm not sure if this estimation could be really problematic that
> people need to debug this way. A cleaner way (if we think people will
> need this often) is just add a new option in "git gc" to report this
> estimation breakdown and do nothing else.

Actually after reading the implementation and seeing what it does, I
personally do not have any problem with the way GIT_TRACE is used
for this purpose in this patch.  I am not sure how interesting the
information given by that codepath in real life; it looks even less
intereseting than say what comes out of "verify-pack --stat".

>> This is finding the largest pack.
>
> The discussion on .keep files raises one question for me, what if the
> largest pack already has a .keep file, do we still consider it the
> base pack, or should we find the next largest non-kept pack?
>
> I'm guessing we keep things simple here and ignore .keep files.

I agree that it is a sensible design decision.

>>> +#elif defined(GIT_WINDOWS_NATIVE)
>>> + MEMORYSTATUSEX memInfo;
>>> +
>>> + memInfo.dwLength = sizeof(MEMORYSTATUSEX);
>>> + if (GlobalMemoryStatusEx())
>>> + return memInfo;ullTotalPhys;
>>
>> Is this legal C in Microsoft land?
>
> That's the problem with writing code without a way to test it. At
> least travis helped catch a compiler bug on mac.
>
> I'm torn between just #error here and let Windows/Mac guys implement
> it (which they may scream "too much work, I don't wanna") but if I
> help write something first, yes things are potentially broken and need
> verification from those guys. I guess I'll just fix this up and hope
> non-linux guys do the rest.

Yup, we all collaborate and help in ways each of us can.  None of us
can be expected to do any more than that ;-)

>> I wonder if the above should go somewhere under compat/ without
>> ifdef but split into separate files for individual archs (I do not
>> know the answer to this question).
>
> My first choice too. I chose this way after seeing online_cpus()
> thread-utils.c. Not sure which way is best either.

OK.

>> But to those who say "packs larger than this value is too big" via
>> configuration, keeping only the largest of these above-threshold
>> packs would look counter-intuitive, wouldn't it, I wonder?
>
> I think I'll just clarify this in the document. There may be a use
> case for keeping multiple large packs, but I don't see it (*). We can
> deal with it when it comes.

When the project's history grows too much, a large pack that holds
its first 10 years of stuff, together with another one that holds
its second 20 years of stuff, may both be larger than the threshold
and want to be kept.  If we pick only the largest one, we would
explode the other one and repack together with loose objects.

But realistically, those who would want to control the way in which
their repository is packed to such a degree are very likely to add
".keep" files to these two packfiles themselves, so the above would
probably not a concern.  Perhaps we shouldn't do the "automatically
pick the largest one and exclude from repacking" when there is a
packfile that is marked with ".keep"?

> The use of super large gc.bigBasePackThreshold to disable this keeping
> base pack is intended. But I can't go too high here it may break
> limits on 32 bit platforms. And 2g sounds really arbitrary.

You could use 42m instead to clarify that it really is an arbitrary
threshold that was chosen only for the purpose of this test perhaps?
;-)


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-07 Thread Junio C Hamano
Johannes Schindelin  writes:

>> OK, does this mean we want to wait before merging the "recreate
>> merge" topic down to 'next'?  For more than a few weeks, it has been
>> slated for 'next'.
>
> Maybe a few more days.
> ...
> I want to discuss this in the other subthread, though.

If we are talking about a drastic change, a few more days may not be
sufficient, but we are not in a hurry, as this already sounds like a
2.18 material anyway.  As you made it clear that it is OK not to
merge the current one for now, my objective of asking the question
is already satisfied ;-)



Re: [PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread Eric Sunshine
On Wed, Mar 7, 2018 at 12:30 PM,   wrote:
> Check that new content is valid with respect to the user defined
> 'working-tree-encoding' attribute.
>
> Signed-off-by: Lars Schneider 
> ---
> diff --git a/convert.c b/convert.c
> @@ -266,6 +266,58 @@ static int will_convert_lf_to_crlf(size_t len, struct 
> text_stat *stats,
> +static int validate_encoding(const char *path, const char *enc,
> + const char *data, size_t len, int die_on_error)
> +{
> +   /* We only check for UTF here as UTF?? can be an alias for UTF-?? */
> +   if (startscase_with(enc, "UTF")) {
> +   /*
> +* Check for detectable errors in UTF encodings
> +*/
> +   if (has_prohibited_utf_bom(enc, data, len)) {
> +   const char *error_msg = _(
> +   "BOM is prohibited in '%s' if encoded as %s");
> +   /*
> +* This advice is shown for UTF-??BE and UTF-??LE 
> encodings.
> +* We cut off the last two characters of the encoding 
> name
> +# to generate the encoding name suitable for BOMs.

s/#/*/

> +*/
> +   const char *advise_msg = _(
> +   "The file '%s' contains a byte order "
> +   "mark (BOM). Please use %s as "
> +   "working-tree-encoding.");
> +   char *upper_enc = xstrdup_toupper(enc);
> +   upper_enc[strlen(upper_enc)-2] = '\0';

Due to startscase_with(...,"UTF"), we know at this point that the
string is at least 3 characters long, thus it's safe to back up by 2.
Good.

> +   advise(advise_msg, path, upper_enc);
> +   free(upper_enc);
> +   if (die_on_error)
> +   die(error_msg, path, enc);
> +   else {
> +   return error(error_msg, path, enc);
> +   }
> diff --git a/t/t0028-working-tree-encoding.sh 
> b/t/t0028-working-tree-encoding.sh
> @@ -62,6 +62,46 @@ test_expect_success 'check $GIT_DIR/info/attributes 
> support' '
>  for i in 16 32
>  do
> +   test_expect_success "check prohibited UTF-${i} BOM" '
> +   test_when_finished "git reset --hard HEAD" &&
> +
> +   echo "*.utf${i}be text working-tree-encoding=utf-${i}be" 
> >>.gitattributes &&
> +   echo "*.utf${i}le text working-tree-encoding=utf-${i}le" 
> >>.gitattributes &&

v10 is checking only hyphenated lowercase encoding name; earlier
versions checked uppercase. For better coverage, it would be nice to
check several combinations: all uppercase, all lowercase, mixed case,
hyphenated, not hyphenated.

I'm not suggesting running all the tests repeatedly but rather just
varying the format of the encoding name in these tests you're adding.
For instance, the above could instead be:

echo "*.utf${i}be text working-tree-encoding=UTF-${i}be" >>.gitattributes &&
echo "*.utf${i}le text working-tree-encoding=utf${i}LE" >>.gitattributes &&

or something.

> +   # Here we add a UTF-16 (resp. UTF-32) files with BOM 
> (big/little-endian)
> +   # but we tell Git to treat it as UTF-16BE/UTF-16LE (resp. 
> UTF-32).
> +   # In these cases the BOM is prohibited.
> +   cp bebom.utf${i}be.raw bebom.utf${i}be &&
> +   test_must_fail git add bebom.utf${i}be 2>err.out &&
> +   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}be" 
> err.out &&
> +
> +   cp lebom.utf${i}le.raw lebom.utf${i}be &&
> +   test_must_fail git add lebom.utf${i}be 2>err.out &&
> +   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}be" 
> err.out &&
> +
> +   cp bebom.utf${i}be.raw bebom.utf${i}le &&
> +   test_must_fail git add bebom.utf${i}le 2>err.out &&
> +   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}le" 
> err.out &&
> +
> +   cp lebom.utf${i}le.raw lebom.utf${i}le &&
> +   test_must_fail git add lebom.utf${i}le 2>err.out &&
> +   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}le" err.out
> +   '
> +
> +   test_expect_success "check required UTF-${i} BOM" '
> +   test_when_finished "git reset --hard HEAD" &&
> +
> +   echo "*.utf${i} text working-tree-encoding=utf-${i}" 
> >>.gitattributes &&

This is another opportunity for checking a variation (case,
hyphenation) of the encoding name rather than using only hyphenated
lowercase.

> +
> +   cp nobom.utf${i}be.raw nobom.utf${i} &&
> +   test_must_fail git add nobom.utf${i} 2>err.out &&
> +   test_i18ngrep "fatal: BOM is required .* utf-${i}" err.out &&
> +
> +   cp 

Hello

2018-03-07 Thread Sare Ouedraogo
Am Mr.Sare Ouedraogo.i work in one of the prime bank here in Burkina 
Faso, i want the bank to transfer the money left by our  late customer 
is a foreigner from Korea. can you invest this money  and also help the 
poor'  the amount value at  $13,300,000.00  (Thirteen Million Three 
Hundred Thousand United States American Dollars), left in his account 
still unclaimed. more details will be given to you if you are 
interested.

best regards
Mr. Sare Ouedraogo


Re: [PATCH v10 6/9] convert: add 'working-tree-encoding' attribute

2018-03-07 Thread Eric Sunshine
On Wed, Mar 7, 2018 at 12:30 PM,   wrote:
> [...]
> Add an attribute to tell Git what encoding the user has defined for a
> given file. If the content is added to the index, then Git converts the
> content to a canonical UTF-8 representation. On checkout Git will
> reverse the conversion.
>
> Signed-off-by: Lars Schneider 
> ---
>  Documentation/gitattributes.txt  |  80 +++
> diff --git a/convert.c b/convert.c
> @@ -265,6 +266,78 @@ static int will_convert_lf_to_crlf(size_t len, struct 
> text_stat *stats,
> +static const char *default_encoding = "UTF-8";
> @@ -978,6 +1051,21 @@ static int ident_to_worktree(const char *path, const 
> char *src, size_t len,
> +static const char *git_path_check_encoding(struct attr_check_item *check)
> +{
> +   const char *value = check->value;
> +
> +   if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value) ||
> +   !strlen(value))
> +   return NULL;
> +
> +   /* Don't encode to the default encoding */
> +   if (!strcasecmp(value, default_encoding))
> +   return NULL;

As of v10, the rest of the code accepts encoding names "UTF-xx" and
"UTFxx" (case insensitive), however, this check recognizes only
"UTF-8" (case insensitive). For consistency, one would expect this
also to recognize "UTF8" (case insensitive).

> +
> +   return value;
> +}


Re: [PATCH v9 4/8] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-07 Thread Torsten Bögershausen
On Tue, Mar 06, 2018 at 03:37:16PM -0800, Junio C Hamano wrote:
> Lars Schneider  writes:
> 
> > After thinking about it I wonder if we should barf on "utf16" without
> > dash. Your Linux iconv would handle this correctly. My macOS iconv would 
> > not.
> > That means the repo would checkout correctly on your machine but not on 
> > mine.
> >
> > What do you think?
> 
> To be bluntly honest, I prefer not to have excess "sanity checks";
> there is no need to barf on utf16 in a project run by those who are
> without macOS friends, for example.  For that matter, while I do not
> hate it so much to reject it, I am not all that supportive of this
> "The consortium says without -LE or -BE suffix there must be BOM,
> and this lacks one, so barf loudly" step in this topic myself.

Loosing or adding a BOM couild render a file useless, depending
on the SW that reads and processes it.

The main reason for being critacal is to make sure that the
material in the working-tree does a safe roundtrip when commited,
pushed, pulled, checked out...

The best thing we can do is probably to follow the specification as
good as possible.

Having a clear specification (UTF-16LE/BE has no BOM, UTF-16 has none,
UTF-16 has a BOM) would even open the chance to work around a buggy
iconv library, because Git can check the BOM.
If, and only if, a platform messes it up, and we want to
make a (compile time) workaround in Git for this very platform.

A consistant behavior across all platforms is a good thing,
sometimes I have a network share mounted to Linux, MacOS and
Windows and it doesn't matter under which Git on which
machine I checkout or commit.

Oh, I see, there is a new patch coming...


[PATCH v10 8/9] convert: add tracing for 'working-tree-encoding' attribute

2018-03-07 Thread lars . schneider
From: Lars Schneider 

Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable
tracing for content that is reencoded with the 'working-tree-encoding'
attribute. This is useful to debug encoding issues.

Signed-off-by: Lars Schneider 
---
 convert.c| 25 +
 t/t0028-working-tree-encoding.sh |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/convert.c b/convert.c
index f19a15dd02..5776dfbc99 100644
--- a/convert.c
+++ b/convert.c
@@ -318,6 +318,29 @@ static int validate_encoding(const char *path, const char 
*enc,
return 0;
 }
 
+static void trace_encoding(const char *context, const char *path,
+  const char *encoding, const char *buf, size_t len)
+{
+   static struct trace_key coe = TRACE_KEY_INIT(WORKING_TREE_ENCODING);
+   struct strbuf trace = STRBUF_INIT;
+   int i;
+
+   strbuf_addf(, "%s (%s, considered %s):\n", context, path, 
encoding);
+   for (i = 0; i < len && buf; ++i) {
+   strbuf_addf(
+   ,"| \e[2m%2i:\e[0m %2x \e[2m%c\e[0m%c",
+   i,
+   (unsigned char) buf[i],
+   (buf[i] > 32 && buf[i] < 127 ? buf[i] : ' '),
+   ((i+1) % 8 && (i+1) < len ? ' ' : '\n')
+   );
+   }
+   strbuf_addchars(, '\n', 1);
+
+   trace_strbuf(, );
+   strbuf_release();
+}
+
 static const char *default_encoding = "UTF-8";
 
 static int encode_to_git(const char *path, const char *src, size_t src_len,
@@ -346,6 +369,7 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
if (validate_encoding(path, enc, src, src_len, die_on_error))
return 0;
 
+   trace_encoding("source", path, enc, src, src_len);
dst = reencode_string_len(src, src_len, default_encoding, enc,
  _len);
if (!dst) {
@@ -363,6 +387,7 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
return 0;
}
}
+   trace_encoding("destination", path, default_encoding, dst, dst_len);
 
strbuf_attach(buf, dst, dst_len, dst_len + 1);
return 1;
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index aa05f82166..6f3a82f61b 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -4,6 +4,8 @@ test_description='working-tree-encoding conversion via 
gitattributes'
 
 . ./test-lib.sh
 
+GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING
+
 test_expect_success 'setup test files' '
git config core.eol lf &&
 
-- 
2.16.2



[PATCH v10 6/9] convert: add 'working-tree-encoding' attribute

2018-03-07 Thread lars . schneider
From: Lars Schneider 

Git recognizes files encoded with ASCII or one of its supersets (e.g.
UTF-8 or ISO-8859-1) as text files. All other encodings are usually
interpreted as binary and consequently built-in Git text processing
tools (e.g. 'git diff') as well as most Git web front ends do not
visualize the content.

Add an attribute to tell Git what encoding the user has defined for a
given file. If the content is added to the index, then Git converts the
content to a canonical UTF-8 representation. On checkout Git will
reverse the conversion.

Signed-off-by: Lars Schneider 
---
 Documentation/gitattributes.txt  |  80 +++
 convert.c| 110 +++-
 convert.h|   1 +
 sha1_file.c  |   2 +-
 t/t0028-working-tree-encoding.sh | 133 +++
 5 files changed, 324 insertions(+), 2 deletions(-)
 create mode 100755 t/t0028-working-tree-encoding.sh

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 30687de81a..31a4f92840 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -272,6 +272,86 @@ few exceptions.  Even though...
   catch potential problems early, safety triggers.
 
 
+`working-tree-encoding`
+^^^
+
+Git recognizes files encoded in ASCII or one of its supersets (e.g.
+UTF-8, ISO-8859-1, ...) as text files. Files encoded in certain other
+encodings (e.g. UTF-16) are interpreted as binary and consequently
+built-in Git text processing tools (e.g. 'git diff') as well as most Git
+web front ends do not visualize the contents of these files by default.
+
+In these cases you can tell Git the encoding of a file in the working
+directory with the `working-tree-encoding` attribute. If a file with this
+attribute is added to Git, then Git reencodes the content from the
+specified encoding to UTF-8. Finally, Git stores the UTF-8 encoded
+content in its internal data structure (called "the index"). On checkout
+the content is reencoded back to the specified encoding.
+
+Please note that using the `working-tree-encoding` attribute may have a
+number of pitfalls:
+
+- Alternative Git implementations (e.g. JGit or libgit2) and older Git
+  versions (as of March 2018) do not support the `working-tree-encoding`
+  attribute. If you decide to use the `working-tree-encoding` attribute
+  in your repository, then it is strongly recommended to ensure that all
+  clients working with the repository support it.
+
+  For example, Microsoft Visual Studio resources files (`*.rc`) or
+  PowerShell script files (`*.ps1`) are sometimes encoded in UTF-16.
+  If you declare `*.ps1` as files as UTF-16 and you add `foo.ps1` with
+  a `working-tree-encoding` enabled Git client, then `foo.ps1` will be
+  stored as UTF-8 internally. A client without `working-tree-encoding`
+  support will checkout `foo.ps1` as UTF-8 encoded file. This will
+  typically cause trouble for the users of this file.
+
+  If a Git client, that does not support the `working-tree-encoding`
+  attribute, adds a new file `bar.ps1`, then `bar.ps1` will be
+  stored "as-is" internally (in this example probably as UTF-16).
+  A client with `working-tree-encoding` support will interpret the
+  internal contents as UTF-8 and try to convert it to UTF-16 on checkout.
+  That operation will fail and cause an error.
+
+- Reencoding content requires resources that might slow down certain
+  Git operations (e.g 'git checkout' or 'git add').
+
+Use the `working-tree-encoding` attribute only if you cannot store a file
+in UTF-8 encoding and if you want Git to be able to process the content
+as text.
+
+As an example, use the following attributes if your '*.ps1' files are
+UTF-16 encoded with byte order mark (BOM) and you want Git to perform
+automatic line ending conversion based on your platform.
+
+
+*.ps1  text working-tree-encoding=UTF-16
+
+
+Use the following attributes if your '*.ps1' files are UTF-16 little
+endian encoded without BOM and you want Git to use Windows line endings
+in the working directory. Please note, it is highly recommended to
+explicitly define the line endings with `eol` if the `working-tree-encoding`
+attribute is used to avoid ambiguity.
+
+
+*.ps1  text working-tree-encoding=UTF-16LE eol=CRLF
+
+
+You can get a list of all available encodings on your platform with the
+following command:
+
+
+iconv --list
+
+
+If you do not know the encoding of a file, then you can use the `file`
+command to guess the encoding:
+
+
+file foo.ps1
+
+
+
 `ident`
 ^^^
 
diff --git a/convert.c b/convert.c
index b976eb968c..80549ff2b5 100644
--- a/convert.c
+++ b/convert.c
@@ -7,6 +7,7 

[PATCH v10 5/9] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-07 Thread lars . schneider
From: Lars Schneider 

If the endianness is not defined in the encoding name, then let's
be strict and require a BOM to avoid any encoding confusion. The
is_missing_required_utf_bom() function returns true if a required BOM
is missing.

The Unicode standard instructs to assume big-endian if there in no BOM
for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used
in HTML5 recommends to assume little-endian to "deal with deployed
content" [3]. Strictly requiring a BOM seems to be the safest option
for content in Git.

This function is used in a subsequent commit.

[1] http://unicode.org/faq/utf_bom.html#gen6
[2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
 Section 3.10, D98, page 132
[3] https://encoding.spec.whatwg.org/#utf-16le

Signed-off-by: Lars Schneider 
---
 utf8.c | 13 +
 utf8.h | 19 +++
 2 files changed, 32 insertions(+)

diff --git a/utf8.c b/utf8.c
index e4b99580f0..81c6678df1 100644
--- a/utf8.c
+++ b/utf8.c
@@ -564,6 +564,19 @@ int has_prohibited_utf_bom(const char *enc, const char 
*data, size_t len)
);
 }
 
+int is_missing_required_utf_bom(const char *enc, const char *data, size_t len)
+{
+   return (
+  (!strcasecmp(enc, "UTF-16") || !strcasecmp(enc, "UTF16")) &&
+  !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
+has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
+   ) || (
+  (!strcasecmp(enc, "UTF-32") || !strcasecmp(enc, "UTF32")) &&
+  !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
+has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
+   );
+}
+
 /*
  * Returns first character length in bytes for multi-byte `text` according to
  * `encoding`.
diff --git a/utf8.h b/utf8.h
index 0db1db4519..cce654a64a 100644
--- a/utf8.h
+++ b/utf8.h
@@ -79,4 +79,23 @@ void strbuf_utf8_align(struct strbuf *buf, align_type 
position, unsigned int wid
  */
 int has_prohibited_utf_bom(const char *enc, const char *data, size_t len);
 
+/*
+ * If the endianness is not defined in the encoding name, then we
+ * require a BOM. The function returns true if a required BOM is missing.
+ *
+ * The Unicode standard instructs to assume big-endian if there in no
+ * BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard
+ * used in HTML5 recommends to assume little-endian to "deal with
+ * deployed content" [3].
+ *
+ * Therefore, strictly requiring a BOM seems to be the safest option for
+ * content in Git.
+ *
+ * [1] http://unicode.org/faq/utf_bom.html#gen6
+ * [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
+ * Section 3.10, D98, page 132
+ * [3] https://encoding.spec.whatwg.org/#utf-16le
+ */
+int is_missing_required_utf_bom(const char *enc, const char *data, size_t len);
+
 #endif
-- 
2.16.2



[PATCH v10 4/9] utf8: add function to detect prohibited UTF-16/32 BOM

2018-03-07 Thread lars . schneider
From: Lars Schneider 

Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
or UTF-32LE a BOM must not be used [1]. The function returns true if
this is the case.

This function is used in a subsequent commit.

[1] http://unicode.org/faq/utf_bom.html#bom10

Signed-off-by: Lars Schneider 
---
 utf8.c | 26 ++
 utf8.h |  9 +
 2 files changed, 35 insertions(+)

diff --git a/utf8.c b/utf8.c
index 2c27ce0137..e4b99580f0 100644
--- a/utf8.c
+++ b/utf8.c
@@ -538,6 +538,32 @@ char *reencode_string_len(const char *in, int insz,
 }
 #endif
 
+static int has_bom_prefix(const char *data, size_t len,
+ const char *bom, size_t bom_len)
+{
+   return (len >= bom_len) && !memcmp(data, bom, bom_len);
+}
+
+static const char utf16_be_bom[] = {0xFE, 0xFF};
+static const char utf16_le_bom[] = {0xFF, 0xFE};
+static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
+static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
+
+int has_prohibited_utf_bom(const char *enc, const char *data, size_t len)
+{
+   return (
+ (!strcasecmp(enc, "UTF-16BE") || !strcasecmp(enc, "UTF-16LE") ||
+  !strcasecmp(enc, "UTF16BE") || !strcasecmp(enc, "UTF16LE")) &&
+ (has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
+  has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
+   ) || (
+ (!strcasecmp(enc, "UTF-32BE") || !strcasecmp(enc, "UTF-32LE") ||
+  !strcasecmp(enc, "UTF32BE") || !strcasecmp(enc, "UTF32LE")) &&
+ (has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
+  has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
+   );
+}
+
 /*
  * Returns first character length in bytes for multi-byte `text` according to
  * `encoding`.
diff --git a/utf8.h b/utf8.h
index 6bbcf31a83..0db1db4519 100644
--- a/utf8.h
+++ b/utf8.h
@@ -70,4 +70,13 @@ typedef enum {
 void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int 
width,
   const char *s);
 
+/*
+ * If a data stream is declared as UTF-16BE or UTF-16LE, then a UTF-16
+ * BOM must not be used [1]. The same applies for the UTF-32 equivalents.
+ * The function returns true if this rule is violated.
+ *
+ * [1] http://unicode.org/faq/utf_bom.html#bom10
+ */
+int has_prohibited_utf_bom(const char *enc, const char *data, size_t len);
+
 #endif
-- 
2.16.2



[PATCH v10 0/9] convert: add support for different encodings

2018-03-07 Thread lars . schneider
From: Lars Schneider 

Hi,

Patches 1-5,8 are preparation and helper functions. Patch 3 is new.
Patch 6,7,9 are the actual change.

This series depends on Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is
already in master.

Changes since v9:

* make has_bom_prefix() / is_missing_required_utf_bom() more lenient in
  what they accept (ignore casing, accept UTF?? and UTF-?? , Junio)
* replace memcmp() which does not check the length of the strings with
  a case insensitive variant of starts_with() (Junio)
* do not convert encoding names to uppercase
  (this fixes a leak introduced in the last iteration, Eric)
* do not cleanup test files that the test did not create (Eric)
* do not cleanup err.out files in tests (Eric)

I did not address Eric's feedback to make validate_encoding()
cleaner [1] as I want to stabilize the series and Eric wrote
that we can clean this up later:
http://public-inbox.org/git/capig+csoka-ybtybz42jgqtych7ldwntoeovdzfg0_64o9q...@mail.gmail.com

Thanks,
Lars


  RFC: 
https://public-inbox.org/git/bdb9b884-6d17-4be3-a83c-f67e2afa2...@gmail.com/
   v1: 
https://public-inbox.org/git/20171211155023.1405-1-lars.schnei...@autodesk.com/
   v2: 
https://public-inbox.org/git/2017122915.39680-1-lars.schnei...@autodesk.com/
   v3: 
https://public-inbox.org/git/20180106004808.77513-1-lars.schnei...@autodesk.com/
   v4: 
https://public-inbox.org/git/20180120152418.52859-1-lars.schnei...@autodesk.com/
   v5: https://public-inbox.org/git/20180129201855.9182-1-tbo...@web.de/
   v6: 
https://public-inbox.org/git/20180209132830.55385-1-lars.schnei...@autodesk.com/
   v7: 
https://public-inbox.org/git/20180215152711.158-1-lars.schnei...@autodesk.com/
   v8: 
https://public-inbox.org/git/20180224162801.98860-1-lars.schnei...@autodesk.com/
   v9: 
https://public-inbox.org/git/20180304201418.60958-1-lars.schnei...@autodesk.com/



Base Ref:
Web-Diff: https://github.com/larsxschneider/git/commit/a602b8dcef
Checkout: git fetch https://github.com/larsxschneider/git encoding-v10 && git 
checkout a602b8dcef


### Interdiff (v9..v10):

diff --git a/convert.c b/convert.c
index 6cbb2b2618..e861f1abbc 100644
--- a/convert.c
+++ b/convert.c
@@ -269,7 +269,8 @@ static int will_convert_lf_to_crlf(size_t len, struct 
text_stat *stats,
 static int validate_encoding(const char *path, const char *enc,
  const char *data, size_t len, int die_on_error)
 {
-   if (!memcmp("UTF-", enc, 4)) {
+   /* We only check for UTF here as UTF?? can be an alias for UTF-?? */
+   if (startscase_with(enc, "UTF")) {
/*
 * Check for detectable errors in UTF encodings
 */
@@ -277,16 +278,18 @@ static int validate_encoding(const char *path, const char 
*enc,
const char *error_msg = _(
"BOM is prohibited in '%s' if encoded as %s");
/*
-* This advice is shown for UTF-??BE and UTF-??LE
-* encodings. We truncate the encoding name to 6
-* chars with %.6s to cut off the last two "byte
-* order" characters.
+* This advice is shown for UTF-??BE and UTF-??LE 
encodings.
+* We cut off the last two characters of the encoding 
name
+# to generate the encoding name suitable for BOMs.
 */
const char *advise_msg = _(
"The file '%s' contains a byte order "
-   "mark (BOM). Please use %.6s as "
+   "mark (BOM). Please use %s as "
"working-tree-encoding.");
-   advise(advise_msg, path, enc);
+   char *upper_enc = xstrdup_toupper(enc);
+   upper_enc[strlen(upper_enc)-2] = '\0';
+   advise(advise_msg, path, upper_enc);
+   free(upper_enc);
if (die_on_error)
die(error_msg, path, enc);
else {
@@ -301,7 +304,9 @@ static int validate_encoding(const char *path, const char 
*enc,
"mark (BOM). Please use %sBE or %sLE "
"(depending on the byte order) as "
"working-tree-encoding.");
-   advise(advise_msg, path, enc, enc);
+   char *upper_enc = xstrdup_toupper(enc);
+   advise(advise_msg, path, upper_enc, upper_enc);
+   free(upper_enc);
if (die_on_error)
die(error_msg, path, enc);
else {
@@ -1216,11 +1221,7 @@ static const char *git_path_check_encoding(struct 
attr_check_item 

[PATCH v10 1/9] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()

2018-03-07 Thread lars . schneider
From: Lars Schneider 

Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we
allocate the buffer for the lower case string with xmallocz(). This
already ensures a NUL at the end of the allocated buffer.

Remove the unnecessary assignment.

Signed-off-by: Lars Schneider 
---
 strbuf.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index 1df674e919..55b7daeb35 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -781,7 +781,6 @@ char *xstrdup_tolower(const char *string)
result = xmallocz(len);
for (i = 0; i < len; i++)
result[i] = tolower(string[i]);
-   result[i] = '\0';
return result;
 }
 
-- 
2.16.2



[PATCH v10 7/9] convert: check for detectable errors in UTF encodings

2018-03-07 Thread lars . schneider
From: Lars Schneider 

Check that new content is valid with respect to the user defined
'working-tree-encoding' attribute.

Signed-off-by: Lars Schneider 
---
 convert.c| 55 +++
 t/t0028-working-tree-encoding.sh | 56 
 2 files changed, 111 insertions(+)

diff --git a/convert.c b/convert.c
index 80549ff2b5..f19a15dd02 100644
--- a/convert.c
+++ b/convert.c
@@ -266,6 +266,58 @@ static int will_convert_lf_to_crlf(size_t len, struct 
text_stat *stats,
 
 }
 
+static int validate_encoding(const char *path, const char *enc,
+ const char *data, size_t len, int die_on_error)
+{
+   /* We only check for UTF here as UTF?? can be an alias for UTF-?? */
+   if (startscase_with(enc, "UTF")) {
+   /*
+* Check for detectable errors in UTF encodings
+*/
+   if (has_prohibited_utf_bom(enc, data, len)) {
+   const char *error_msg = _(
+   "BOM is prohibited in '%s' if encoded as %s");
+   /*
+* This advice is shown for UTF-??BE and UTF-??LE 
encodings.
+* We cut off the last two characters of the encoding 
name
+# to generate the encoding name suitable for BOMs.
+*/
+   const char *advise_msg = _(
+   "The file '%s' contains a byte order "
+   "mark (BOM). Please use %s as "
+   "working-tree-encoding.");
+   char *upper_enc = xstrdup_toupper(enc);
+   upper_enc[strlen(upper_enc)-2] = '\0';
+   advise(advise_msg, path, upper_enc);
+   free(upper_enc);
+   if (die_on_error)
+   die(error_msg, path, enc);
+   else {
+   return error(error_msg, path, enc);
+   }
+
+   } else if (is_missing_required_utf_bom(enc, data, len)) {
+   const char *error_msg = _(
+   "BOM is required in '%s' if encoded as %s");
+   const char *advise_msg = _(
+   "The file '%s' is missing a byte order "
+   "mark (BOM). Please use %sBE or %sLE "
+   "(depending on the byte order) as "
+   "working-tree-encoding.");
+   char *upper_enc = xstrdup_toupper(enc);
+   advise(advise_msg, path, upper_enc, upper_enc);
+   free(upper_enc);
+   if (die_on_error)
+   die(error_msg, path, enc);
+   else {
+   return error(error_msg, path, enc);
+   }
+   }
+
+   }
+   return 0;
+}
+
 static const char *default_encoding = "UTF-8";
 
 static int encode_to_git(const char *path, const char *src, size_t src_len,
@@ -291,6 +343,9 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
if (!buf && !src)
return 1;
 
+   if (validate_encoding(path, enc, src, src_len, die_on_error))
+   return 0;
+
dst = reencode_string_len(src, src_len, default_encoding, enc,
  _len);
if (!dst) {
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 89b4dbee1d..aa05f82166 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -62,6 +62,46 @@ test_expect_success 'check $GIT_DIR/info/attributes support' 
'
 
 for i in 16 32
 do
+   test_expect_success "check prohibited UTF-${i} BOM" '
+   test_when_finished "git reset --hard HEAD" &&
+
+   echo "*.utf${i}be text working-tree-encoding=utf-${i}be" 
>>.gitattributes &&
+   echo "*.utf${i}le text working-tree-encoding=utf-${i}le" 
>>.gitattributes &&
+
+   # Here we add a UTF-16 (resp. UTF-32) files with BOM 
(big/little-endian)
+   # but we tell Git to treat it as UTF-16BE/UTF-16LE (resp. 
UTF-32).
+   # In these cases the BOM is prohibited.
+   cp bebom.utf${i}be.raw bebom.utf${i}be &&
+   test_must_fail git add bebom.utf${i}be 2>err.out &&
+   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}be" err.out 
&&
+
+   cp lebom.utf${i}le.raw lebom.utf${i}be &&
+   test_must_fail git add lebom.utf${i}be 2>err.out &&
+   test_i18ngrep "fatal: BOM is prohibited .* utf-${i}be" err.out 
&&
+
+   cp bebom.utf${i}be.raw bebom.utf${i}le &&
+   

[PATCH v10 9/9] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-07 Thread lars . schneider
From: Lars Schneider 

UTF supports lossless conversion round tripping and conversions between
UTF and other encodings are mostly round trip safe as Unicode aims to be
a superset of all other character encodings. However, certain encodings
(e.g. SHIFT-JIS) are known to have round trip issues [1].

Add 'core.checkRoundtripEncoding', which contains a comma separated
list of encodings, to define for what encodings Git should check the
conversion round trip if they are used in the 'working-tree-encoding'
attribute.

Set SHIFT-JIS as default value for 'core.checkRoundtripEncoding'.

[1] 
https://support.microsoft.com/en-us/help/170559/prb-conversion-problem-between-shift-jis-and-unicode

Signed-off-by: Lars Schneider 
---
 Documentation/config.txt |  6 
 Documentation/gitattributes.txt  |  8 +
 config.c |  5 +++
 convert.c| 78 
 convert.h|  1 +
 environment.c|  1 +
 t/t0028-working-tree-encoding.sh | 39 
 7 files changed, 138 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92b..d7a56054a5 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -530,6 +530,12 @@ core.autocrlf::
This variable can be set to 'input',
in which case no output conversion is performed.
 
+core.checkRoundtripEncoding::
+   A comma separated list of encodings that Git performs UTF-8 round
+   trip checks on if they are used in an `working-tree-encoding`
+   attribute (see linkgit:gitattributes[5]). The default value is
+   `SHIFT-JIS`.
+
 core.symlinks::
If false, symbolic links are checked out as small plain files that
contain the link text. linkgit:git-update-index[1] and
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 31a4f92840..aa3deae392 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -312,6 +312,14 @@ number of pitfalls:
   internal contents as UTF-8 and try to convert it to UTF-16 on checkout.
   That operation will fail and cause an error.
 
+- Reencoding content to non-UTF encodings can cause errors as the
+  conversion might not be UTF-8 round trip safe. If you suspect your
+  encoding to not be round trip safe, then add it to
+  `core.checkRoundtripEncoding` to make Git check the round trip
+  encoding (see linkgit:git-config[1]). SHIFT-JIS (Japanese character
+  set) is known to have round trip issues with UTF-8 and is checked by
+  default.
+
 - Reencoding content requires resources that might slow down certain
   Git operations (e.g 'git checkout' or 'git add').
 
diff --git a/config.c b/config.c
index 1f003fbb90..d0ada9fcd4 100644
--- a/config.c
+++ b/config.c
@@ -1172,6 +1172,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.checkroundtripencoding")) {
+   check_roundtrip_encoding = xstrdup(value);
+   return 0;
+   }
+
if (!strcmp(var, "core.notesref")) {
notes_ref_name = xstrdup(value);
return 0;
diff --git a/convert.c b/convert.c
index 5776dfbc99..e861f1abbc 100644
--- a/convert.c
+++ b/convert.c
@@ -341,6 +341,43 @@ static void trace_encoding(const char *context, const char 
*path,
strbuf_release();
 }
 
+static int check_roundtrip(const char* enc_name)
+{
+   /*
+* check_roundtrip_encoding contains a string of space and/or
+* comma separated encodings (eg. "UTF-16, ASCII, CP1125").
+* Search for the given encoding in that string.
+*/
+   const char *found = strcasestr(check_roundtrip_encoding, enc_name);
+   const char *next;
+   int len;
+   if (!found)
+   return 0;
+   next = found + strlen(enc_name);
+   len = strlen(check_roundtrip_encoding);
+   return (found && (
+   /*
+* check that the found encoding is at the
+* beginning of check_roundtrip_encoding or
+* that it is prefixed with a space or comma
+*/
+   found == check_roundtrip_encoding || (
+   found > check_roundtrip_encoding &&
+   (*(found-1) == ' ' || *(found-1) == ',')
+   )
+   ) && (
+   /*
+* check that the found encoding is at the
+* end of check_roundtrip_encoding or
+* that it is suffixed with a space or comma
+*/
+   next == check_roundtrip_encoding + len || (
+   next < check_roundtrip_encoding + len &&
+

[PATCH v10 3/9] strbuf: add a case insensitive starts_with()

2018-03-07 Thread lars . schneider
From: Lars Schneider 

Check in a case insensitive manner if one string is a prefix of another
string.

This function is used in a subsequent commit.

Signed-off-by: Lars Schneider 
---
 git-compat-util.h | 1 +
 strbuf.c  | 9 +
 2 files changed, 10 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index 68b2ad531e..f648da0c11 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -455,6 +455,7 @@ extern void (*get_warn_routine(void))(const char *warn, 
va_list params);
 extern void set_die_is_recursing_routine(int (*routine)(void));
 
 extern int starts_with(const char *str, const char *prefix);
+extern int startscase_with(const char *str, const char *prefix);
 
 /*
  * If the string "str" begins with the string found in "prefix", return 1.
diff --git a/strbuf.c b/strbuf.c
index b635f0bdc4..5779a2d591 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -11,6 +11,15 @@ int starts_with(const char *str, const char *prefix)
return 0;
 }
 
+int startscase_with(const char *str, const char *prefix)
+{
+   for (; ; str++, prefix++)
+   if (!*prefix)
+   return 1;
+   else if (tolower(*str) != tolower(*prefix))
+   return 0;
+}
+
 int skip_to_optional_arg_default(const char *str, const char *prefix,
 const char **arg, const char *def)
 {
-- 
2.16.2



[PATCH v10 2/9] strbuf: add xstrdup_toupper()

2018-03-07 Thread lars . schneider
From: Lars Schneider 

Create a copy of an existing string and make all characters upper case.
Similar xstrdup_tolower().

This function is used in a subsequent commit.

Signed-off-by: Lars Schneider 
---
 strbuf.c | 12 
 strbuf.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/strbuf.c b/strbuf.c
index 55b7daeb35..b635f0bdc4 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -784,6 +784,18 @@ char *xstrdup_tolower(const char *string)
return result;
 }
 
+char *xstrdup_toupper(const char *string)
+{
+   char *result;
+   size_t len, i;
+
+   len = strlen(string);
+   result = xmallocz(len);
+   for (i = 0; i < len; i++)
+   result[i] = toupper(string[i]);
+   return result;
+}
+
 char *xstrvfmt(const char *fmt, va_list ap)
 {
struct strbuf buf = STRBUF_INIT;
diff --git a/strbuf.h b/strbuf.h
index 14c8c10d66..df7ced53ed 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -607,6 +607,7 @@ __attribute__((format (printf,2,3)))
 extern int fprintf_ln(FILE *fp, const char *fmt, ...);
 
 char *xstrdup_tolower(const char *);
+char *xstrdup_toupper(const char *);
 
 /**
  * Create a newly allocated string using printf format. You can do this easily
-- 
2.16.2



Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-07 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:
> Hi Sergey,
>
> On Wed, 7 Mar 2018, Sergey Organov wrote:
>
>> Johannes Schindelin  writes:
>> 
>> > On Tue, 6 Mar 2018, Sergey Organov wrote:
>> >
>> >> This is v2 of my "Rebasing merges" proposal.
>> >
>> > Didn't we settle on Phillip's "perform successive three-way merges
>> > between the original merge commit and the new tips with the old tips
>> > as base" strategy?
>> 
>> It seems you did, dunno exactly why.
>
> That is not true. You make it sound like I was the only one who liked
> this, and not Phillip and Buga, too.
>
> Are you interested in the best solution, or in your solution :-)

I'm interested in any that works, and only you say that those suggested
by Phillip is somehow superior. I still believe it's mine that superior,
even if slightly.

>
>> The main problem with this decision is that we still don't see how and
>> when to stop for user amendment using this method. OTOH, the original
>> has this issue carefully discussed.
>
> Why would we want to stop, unless there are merge conflicts?

There is somewhat lengthy discussion about it that you probably missed.
Not to repeat it, just see how 'rerere' works when it fires during
rebase, even with no conflicts.

>
>> > It has the following advantages:
>> >
>> > - it is *very simple* to describe
>> 
>> The original is as simple if not simpler:
>> 
>> "rebase sides of the merge commit and then three-way merge them back
>> using original merge commit as base"
>
> And that is also wrong, as I had proved already! Only Buga's addition made
> it robust against dropping/modifying commits, and that addition also makes
> it more complicated.

No. Get your facts straight. The [RFC v2] already fixed that original
mistake. Could you please finally read it?

> And it still has no satisfactory simple explanation why it works.

It has. It's there in the [RFC v2]. You seem to be the only one who
doesn't get it. I suppose you just didn't bother to read.

>> No problems with octopuses, and no virtual merge bases of recursive
>> merges to reason about.
>
> But those are no problems for Phillip's strategy, either!

I thought it was you who started to discuss virtual merge bases and
related problems, as well as how it's difficult to support octopus
merges, but it's fine with me if there are none of these problems.

>
> So your point is...?

Still the same -- use what's better, the [RFC v2].

>
>> > - it is *very easy* to reason about, once it is pointed out that
>> > rebases and merges result in the same trees.
>> 
>> The original is as easy to reason about, if not easier, especially as
>> recursive merge strategy is not being used there in new ways.
>
> So do it. I still have to hear a single-sentence, clear and obvious
> explanation why it works.
>
> And please do not describe why your original version works, because it
> does not work.

Original [RFC] didn't work because of rather simple mistake that I've
already admitted and fixed. [RFC v2] has got the fix. Read [RFC v2] and
get your facts straight.

> Describe why the one amended with Buga's hack works.

It doesn't matter as these hacks are not needed anymore.

>
>> I honestly don't see any advantages of Phillip's method over the
>> original, except personal preferences. At the same time, I have no
>> objection of using it either, provided consistency check problem is
>> solved there as well.
>
> Okay, let me reiterate then, because I do not want this point to be
> missed:
>
> Phillip's method is essentially merging the new tips into the original
> merge, pretending that the new tips were not rebased but merged into
> upstream.
>
> So it exploits the duality of the rebase and merge operation, which both
> result in identical trees (potentially after resolving merge
> conflicts).
>
> I cannot think of any such interpretation for your proposal augmented by
> Buga's fix-ups. And I haven't heard any such interpretation from your
> side, either.

No fix-ups or augmentations are needed. It was a mistake that has been
fixed in [RFC v2]. You've missed essential part of the discussion.

Read the [RFC v2], please:

Significant changes are:

1. Fixed mistake in the final merge step in the original proposal: wrong
   merge base was used.

-- Sergey


(nessun oggetto)

2018-03-07 Thread Mr Sheng Li Hung



--
I am Mr.Sheng Li Hung I have a very profitable business proposition for 
you





Re: [PATCH] xdiff: improve trimming preprocessing

2018-03-07 Thread René Scharfe
Am 07.03.2018 um 10:36 schrieb Eric Sunshine:
> On Tue, Mar 6, 2018 at 6:05 PM, Jun Wu  wrote:
>> Excerpts from Eric Sunshine's message of 2018-03-06 14:23:46 -0500:
>>> On Tue, Mar 6, 2018 at 6:53 AM, Jun Wu  wrote:
 +  printf "x%.0s" {1..934} >>d # pad common suffix to 1024 bytes
>>>
>>> The expression {x..y} is not portable to non-POSIX shells.
>>
>> Is there a recommended way to generate a repetitive string?
>> Maybe `seq 1000 | sed 's/.*/x/'` ?
> 
> That seems reasonable, although you'd want to use the test suite's
> more portable test_seq rather than seq.

You could also use Perl's repetition operator (x):

perl -e "print \"x\" x 934" >>d

It is a single command (only one fork()/exec()), simple, produces no
extra newline characters and it's already used e.g. in
t5300-pack-object.sh.

René


Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-07 Thread Sergey Organov
Johannes Schindelin  writes:

> Hi Sergey,
>
> On Wed, 7 Mar 2018, Sergey Organov wrote:
>
>> Johannes Schindelin  writes:
>> 
>> > On Tue, 6 Mar 2018, Phillip Wood wrote:
>> >
>> >> On 03/03/18 00:29, Igor Djordjevic wrote:
>> >> > 
>> >> > On 02/03/2018 12:31, Phillip Wood wrote:
>> >> >>
>> >> >>> Thinking about it overnight, I now suspect that original proposal
>> >> >>> had a mistake in the final merge step. I think that what you did is
>> >> >>> a way to fix it, and I want to try to figure what exactly was wrong
>> >> >>> in the original proposal and to find simpler way of doing it right.
>> >> >>>
>> >> >>> The likely solution is to use original UM as a merge-base for final
>> >> >>> 3-way merge of U1' and U2', but I'm not sure yet. Sounds pretty
>> >> >>> natural though, as that's exactly UM from which both U1' and U2'
>> >> >>> have diverged due to rebasing and other history editing.
>> >> >>
>> >> >> Hi Sergey, I've been following this discussion from the sidelines,
>> >> >> though I haven't had time to study all the posts in this thread in
>> >> >> detail. I wonder if it would be helpful to think of rebasing a merge
>> >> >> as merging the changes in the parents due to the rebase back into the
>> >> >> original merge. So for a merge M with parents A B C that are rebased
>> >> >> to A' B' C' the rebased merge M' would be constructed by (ignoring
>> >> >> shell quoting issues)
>> >> >>
>> >> >> git checkout --detach M
>> >> >> git merge-recursive A -- M A'
>> >> >> tree=$(git write-tree)
>> >> >> git merge-recursive B -- $tree B'
>> >> >> tree=$(git write-tree)
>> >> >> git merge-recursive C -- $tree C'
>> >> >> tree=$(git write-tree)
>> >> >> M'=$(git log --pretty=%B -1 M | git commit-tree -pA' -pB' -pC')
>> >> >>
>> >> >> This should pull in all the changes from the parents while preserving
>> >> >> any evil conflict resolution in the original merge. It superficially
>> >> >> reminds me of incremental merging [1] but it's so long since I looked 
>> >> >> at
>> >> >> that I'm not sure if there are any significant similarities.
>> >> >>
>> >> >> [1] https://github.com/mhagger/git-imerge
>> >> > 
>> >> > Interesting, from quick test[3], this seems to produce the same 
>> >> > result as that other test I previously provided[2], where temporary 
>> >> > commits U1' and U2' are finally merged with original M as a base :)
>> >> > 
>> >> > Just that this looks like even more straight-forward approach...?
>> >> > 
>> >> > The only thing I wonder of here is how would we check if the 
>> >> > "rebased" merge M' was "clean", or should we stop for user amendment? 
>> >> > With that other approach Sergey described, we have U1'==U2' to test 
>> >> > with.
>> >> 
>> >> I think (though I haven't rigorously proved to myself) that in the
>> >> absence of conflicts this scheme has well defined semantics (the merges
>> >> can be commuted), so the result should be predicable from the users
>> >> point of view so maybe it could just offer an option to stop.
>> >
>> > I am not so sure that the result is independent of the order of the
>> > merges. In other words, I am not necessarily certain that it is impossible
>> > to concoct A,A',B,B' commits where merging B'/B before A'/A has a
>> > different result than merging A'/A before B'/B.
>> >
>> > Remember, when constructing counter-examples to hypotheses, those
>> > counter-examples do not really *have* to make sense on their own. For
>> > example, A' could introduce *completely different* changes from A, and the
>> > same is true for B' and B.
>> >
>> > I could imagine, for example, that using a ton of consecutive empty lines,
>> > and using patches that insert something into these empty lines (and are
>> > thusly inherently ambiguous when said set of empty lines has changed),
>> > could even introduce a merge conflict in one order, but no conflict in the
>> > other.
>> >
>> > Even so, I think that merging in the order of the parents makes the most
>> > sense, and that using that strategy makes sense, too, because you really
>> > have to try hard to make it fail.
>> 
>> Alternatively, consider to adopt the original approach that has none of
>> these issues as it uses exactly the same method for rebasing merge
>> commits that you are already using for rebasing simple commits, not to
>> mention the advantage of the built-in consistency check.
>
> Surely I misunderstand?
>
> How can your approach -- which relies *very much* on having the original
> parent commits -- not *require* that consistency check?

I don't understand what you mean, sorry. Could you please point me
to the *require* you talk about in the original proposal?

> What would your approach (that still has no satisfyingly trivial
> explanation, in my mind)

Here is one-liner: rebase sides of the merge commit and then 3-way
merge them, using original merge commit as merge base.

> do if somebody edited a `merge` command and let it merge a completely
> 

Re: [RFC] Contributing to Git (on Windows)

2018-03-07 Thread Johannes Schindelin
Hi Jonathan,

On Mon, 5 Mar 2018, Jonathan Nieder wrote:

> BTW, thanks again for writing and submitting this document.  It can't
> land soon enough. :)

It landed:

https://github.com/git-for-windows/git/blob/master/CONTRIBUTING.md

Ciao,
Dscho


Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-07 Thread Johannes Schindelin
Hi Sergey,

On Wed, 7 Mar 2018, Sergey Organov wrote:

> Johannes Schindelin  writes:
> 
> > On Tue, 6 Mar 2018, Sergey Organov wrote:
> >
> >> This is v2 of my "Rebasing merges" proposal.
> >
> > Didn't we settle on Phillip's "perform successive three-way merges
> > between the original merge commit and the new tips with the old tips
> > as base" strategy?
> 
> It seems you did, dunno exactly why.

That is not true. You make it sound like I was the only one who liked
this, and not Phillip and Buga, too.

Are you interested in the best solution, or in your solution :-)

> The main problem with this decision is that we still don't see how and
> when to stop for user amendment using this method. OTOH, the original
> has this issue carefully discussed.

Why would we want to stop, unless there are merge conflicts?

> > It has the following advantages:
> >
> > - it is *very simple* to describe
> 
> The original is as simple if not simpler:
> 
> "rebase sides of the merge commit and then three-way merge them back
> using original merge commit as base"

And that is also wrong, as I had proved already! Only Buga's addition made
it robust against dropping/modifying commits, and that addition also makes
it more complicated.

And it still has no satisfactory simple explanation why it works.

> No problems with octopuses, and no virtual merge bases of recursive
> merges to reason about.

But those are no problems for Phillip's strategy, either!

So your point is...?

> > - it is *very easy* to reason about, once it is pointed out that
> > rebases and merges result in the same trees.
> 
> The original is as easy to reason about, if not easier, especially as
> recursive merge strategy is not being used there in new ways.

So do it. I still have to hear a single-sentence, clear and obvious
explanation why it works.

And please do not describe why your original version works, because it
does not work. Describe why the one amended with Buga's hack works.

> I honestly don't see any advantages of Phillip's method over the
> original, except personal preferences. At the same time, I have no
> objection of using it either, provided consistency check problem is
> solved there as well.

Okay, let me reiterate then, because I do not want this point to be
missed:

Phillip's method is essentially merging the new tips into the original
merge, pretending that the new tips were not rebased but merged into
upstream.

So it exploits the duality of the rebase and merge operation, which both
result in identical trees (potentially after resolving merge conflicts).

I cannot think of any such interpretation for your proposal augmented by
Buga's fix-ups. And I haven't heard any such interpretation from your
side, either.

> >> 3. I now use "True Merge" name instead of former "Trivial Merge", to
> >>avoid confusion with what Git documentation calls "trivial merge",
> >>thanks to Junio C Hamano  for pointing this out.
> >
> > "True Merge" is probably also a candidate for improvement. If what you
> > refer to is a "true" merge, that means all others are "untrue"
> > merges???
> 
> [d]evil merges, obviously.
> 
> Seriously, it's pure history joint. Just "joint' will do.

You might want to try harder to stick with the existing nomenclature. That
would make it a lot easier to discuss.

Ciao,
Johannes


Partial Clone: Commands that could be problematic

2018-03-07 Thread Derrick Stolee
We discussed partial clone today during the contributor's summit. There 
were some concerns about some commands that would cause over-hydration 
of blobs that need server requests to resolve.


GVFS blocks "fsck", "gc", "prune", "repack", "submodule", and "worktree" 
[1]. I promised I would include this list. There are a few limitations 
on the arguments of the other commands, as seen in that code. These 
commands that are blocked are already handled in partial clone with the 
"promisor" pattern. The "worktree" limitation is only due to the 
file-system virtualization layer of GVFS.


We discussed commands like "git grep" that sometimes look at the working 
directory and sometimes crawls trees. That is a command that should be 
considered for batching object downloads, limiting the command to 
"hydrated" blobs, or limiting to a sparse checkout.


Thanks,

-Stolee

[1] 
https://github.com/Microsoft/GVFS/blob/2db0c030eb257beebf8e17f1c2ce72ffb166f533/GVFS/GVFS.Hooks/Program.cs#L120-L137




Re: [PATCH v5 00/12] rebase -i: offer to recreate merge commits

2018-03-07 Thread Johannes Schindelin
Hi Buga,

On Tue, 6 Mar 2018, Igor Djordjevic wrote:

> [...]
>
> But before elaborating, I would like to hear your opinion on whether you
> find it worth to pursue that goal here, before `--recreate-merges` hits
> the mainstream, or it might be just fine as a possible later
> improvement, too (if accepted, that is).

As I suggested in another sub-thread, I think the best way forward is to
use your idea to make the 'rebase original merge commits' strategy
explicit.

That would not actually hold up the current --recreate-merges patch
series, but would mean to provide an add-on patch series to add support
for `merge -R` and then use that from the generated todo list.

For implementation detail reasons, it may actually make sense to integrate
those patches into the --recreate-merges patch series, though. Should not
be hard (except during GitMerge).

> p.s. lol, now that I said it, and after writing all this, I might 
> actually even like the idea of (later) having `--rebase-merges` 
> alongside `--recreate-merges`, too, each one clearly communicating 
> its default mode of operation - rebase merges vs. recreate merges... 
> as one might rightfully expect ;) Eh :P

Hehe...

Ciao,
Dscho


Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-07 Thread Sergey Organov
Hi Johannes,

Johannes Schindelin  writes:

> Hi Sergey,
>
> On Tue, 6 Mar 2018, Sergey Organov wrote:
>
>> This is v2 of my "Rebasing merges" proposal.
>
> Didn't we settle on Phillip's "perform successive three-way merges between
> the original merge commit and the new tips with the old tips as base"
> strategy?

It seems you did, dunno exactly why.

The main problem with this decision is that we still don't see how and
when to stop for user amendment using this method. OTOH, the original
has this issue carefully discussed.

> It has the following advantages:
>
> - it is *very simple* to describe

The original is as simple if not simpler:

"rebase sides of the merge commit and then three-way merge them back
using original merge commit as base"

No problems with octopuses, and no virtual merge bases of recursive
merges to reason about.

> - it is *very easy* to reason about, once it is pointed out that rebases
>   and merges result in the same trees.

The original is as easy to reason about, if not easier, especially
as recursive merge strategy is not being used there in new ways.

I honestly don't see any advantages of Phillip's method over the
original, except personal preferences. At the same time, I have no
objection of using it either, provided consistency check problem is
solved there as well.

>
> ... and BTW...
>
>> 3. I now use "True Merge" name instead of former "Trivial Merge", to
>>avoid confusion with what Git documentation calls "trivial merge",
>>thanks to Junio C Hamano  for pointing this out.
>
> "True Merge" is probably also a candidate for improvement. If what you
> refer to is a "true" merge, that means all others are "untrue"
> merges???

[d]evil merges, obviously.

Seriously, it's pure history joint. Just "joint' will do.

-- Sergey


Re: [GSoC] [PATCH] travis-ci: added clang static analysis

2018-03-07 Thread Johannes Schindelin
Hi,

On Tue, 6 Mar 2018, Siddhartha Mishra wrote:

> Are there any other glaring issues you see in the code?

The rest looks good to me!

Ciao,
Johannes


Re: What's cooking in git.git (Mar 2018, #02; Tue, 6)

2018-03-07 Thread Johannes Schindelin
Hi Dan,

On Tue, 6 Mar 2018, Junio C Hamano wrote:

> * dj/runtime-prefix (2017-12-05) 4 commits
>  . exec_cmd: RUNTIME_PREFIX on some POSIX systems
>  . Makefile: add Perl runtime prefix support
>  . Makefile: add support for "perllibdir"
>  . Makefile: generate Perl header from template file
> 
>  A build-time option has been added to allow Git to be told to refer
>  to its associated files relative to the main binary, in the same
>  way that has been possible on Windows for quite some time, for
>  Linux, BSDs and Darwin.
> 
>  Perhaps it is about time to reboot the effort?

You probably missed this in the huge "What's cooking" mail. Are you game?

Ciao,
Johannes


Contributor Summit Dinner

2018-03-07 Thread Edward Thomson
Hello-

Microsoft would like to invite the contributors summit to dinner tonight;
it will take place at 8pm:

  La Tagliatella (Placa Catalunya)
  Ronda de la Universitat, 31

It's conveniently right around the corner from the Beers with Bitbucket
event, in case you're going to that.

Hope to see you there.

Thanks-
-ed


Bug report: git-stash doesn't return correct status code

2018-03-07 Thread Vromen, Tomer
Hi all,

I often use this one-liner:

> git stash && git checkout -b new-feature-branch && git stash pop

This is useful when I realize that I want to open a new branch for my changes 
(that I haven't committed yet).
However, I might have forgotten to save my changes in the editor, so git-stash 
will give this error:

No local changes to save

Despite the error, the command returns status code 0, and so the branch is 
created and the stash is popped, which was not my intention.
I think that git-stash should return a non-zero status code if it didn't push a 
new stash.

git version: 2.8.4


Thank you,
Tomer Vromen

-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.



Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-03-07 Thread Duy Nguyen
On Wed, Mar 7, 2018 at 2:19 AM, Junio C Hamano  wrote:
>> +--keep-base-pack::
>> + All packs except the base pack are consolidated into a single
>> + pack. The largest pack is considered the base pack.
>
> This makes it sound as if packs with .keep are also repacked unless
> they meet the threshold for "base pack".  Is that what you actually
> implemented?

Copy/paste problem. That is, I copied this from --auto description,
but I missed the "(except those marked with a `.keep` file)" part. No,
I don't think ignoring .keep files is a good idea, at least no by
default.

> In order to do so, [2/5] needs to allow the "--keep-pack" option
> override the on-disk .keep files.  In an earlier message, I wondered
> if such an arrangement is useful in some use cases; I think it is,
> and because those who do want to see the on-disk .keep files honored
> can collect and include them in the set of packs to be kept via
> "--keep-pack" (after all this is an option for low-level scripting
> anyway).

At gc level I don't think we need to allow this. But yeah git-repack
has --pack-kept-objects to ignore .keep. If they specify this, then
repack should ignore .keep (but still follow whatever --keep-pack is
specified). There's some interesting interaction between .keep and
pack bitmap feature in pack-objects though. I'm not so sure what
happens down there yet.

>> +Set environment variable `GIT_TRACE` in order to see the memory usage
>> +estimation in `git gc --auto` that determines whether the base pack is
>> +kept.
>
> This is somewhat a puzzling use of trace.  It sounds more like a way
> to find out "how" memory usage estimation is done and arriving at a
> wrong value for those who want to debug the feature.

Yeah. I'm not sure if this estimation could be really problematic that
people need to debug this way. A cleaner way (if we think people will
need this often) is just add a new option in "git gc" to report this
estimation breakdown and do nothing else.

>> +static struct packed_git *find_the_base_pack(void)
>> +{
>> + struct packed_git *p, *base = NULL;
>> +
>> + prepare_packed_git();
>> +
>> + for (p = packed_git; p; p = p->next) {
>> + if (p->pack_local &&
>> + (!base || base->pack_size < p->pack_size))
>> + base = p;
>> + }
>> +
>> + return base;
>> +}
>
> This is finding the largest pack.

The discussion on .keep files raises one question for me, what if the
largest pack already has a .keep file, do we still consider it the
base pack, or should we find the next largest non-kept pack?

I'm guessing we keep things simple here and ignore .keep files.

>> +#ifdef HAVE_SYSINFO
>> + struct sysinfo si;
>> +
>> + if (!sysinfo())
>> + return si.totalram;
>> +#elif defined(HAVE_BSD_SYSCTL) && defined(HW_MEMSIZE)
>> + int64_t physical_memory;
>> + int mib[2];
>> + size_t length;
>> +
>> + mib[0] = CTL_HW;
>> + mib[1] = HW_MEMSIZE;
>> + length = sizeof(int64_t);
>> + if (!sysctl(mib, 2, _memory, , NULL, 0))
>> + return physical_memory;
>> +#elif defined(GIT_WINDOWS_NATIVE)
>> + MEMORYSTATUSEX memInfo;
>> +
>> + memInfo.dwLength = sizeof(MEMORYSTATUSEX);
>> + if (GlobalMemoryStatusEx())
>> + return memInfo;ullTotalPhys;
>
> Is this legal C in Microsoft land?

That's the problem with writing code without a way to test it. At
least travis helped catch a compiler bug on mac.

I'm torn between just #error here and let Windows/Mac guys implement
it (which they may scream "too much work, I don't wanna") but if I
help write something first, yes things are potentially broken and need
verification from those guys. I guess I'll just fix this up and hope
non-linux guys do the rest.

>> +#else
>> + fprintf(stderr, _("unrecognized platform, assuming %lu GB RAM\n"),
>> + default_ram);
>> +#endif
>> + return default_ram * 1024 * 1024 * 1024;
>> +}
>
> I wonder if the above should go somewhere under compat/ without
> ifdef but split into separate files for individual archs (I do not
> know the answer to this question).

My first choice too. I chose this way after seeing online_cpus()
thread-utils.c. Not sure which way is best either.

>> @@ -427,8 +557,19 @@ int cmd_gc(int argc, const char **argv, const char 
>> *prefix)
>>*/
>>   daemonized = !daemonize();
>>   }
>> - } else
>> - add_repack_all_option();
>> + } else {
>> + struct packed_git *base_pack = find_the_base_pack();
>> + struct packed_git *exclude = NULL;
>> +
>> + if (keep_base_pack != -1) {
>> + if (keep_base_pack)
>> + exclude = base_pack;
>
> OK, --keep-base-pack option always wins if given...
>
>> + } else if (base_pack && big_base_pack_threshold &&
>> +base_pack->pack_size >= 

Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-03-07 Thread Johannes Schindelin
Hi Duy,

On Tue, 6 Mar 2018, Nguyễn Thái Ngọc Duy wrote:

> diff --git a/builtin/gc.c b/builtin/gc.c
> index 77fa720bd0..273657ddf4 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -187,7 +211,101 @@ static int too_many_packs(void)
>   return gc_auto_pack_limit < cnt;
>  }
>  
> -static void add_repack_all_option(void)
> +static inline unsigned long total_ram(void)
> +{
> + unsigned long default_ram = 4;
> +#ifdef HAVE_SYSINFO
> + struct sysinfo si;
> +
> + if (!sysinfo())
> + return si.totalram;
> +#elif defined(HAVE_BSD_SYSCTL) && defined(HW_MEMSIZE)
> + int64_t physical_memory;
> + int mib[2];
> + size_t length;
> +
> + mib[0] = CTL_HW;
> + mib[1] = HW_MEMSIZE;
> + length = sizeof(int64_t);
> + if (!sysctl(mib, 2, _memory, , NULL, 0))
> + return physical_memory;
> +#elif defined(GIT_WINDOWS_NATIVE)
> + MEMORYSTATUSEX memInfo;
> +
> + memInfo.dwLength = sizeof(MEMORYSTATUSEX);
> + if (GlobalMemoryStatusEx())
> + return memInfo;ullTotalPhys;

This fails to compile:

builtin/gc.c: In function 'total_ram':
builtin/gc.c:235:10: error: incompatible types when returning type 
'MEMORYSTATUSEX {aka struct _MEMORYSTATUSEX}' but 'long unsigned int' was 
expected
   return memInfo;ullTotalPhys;
  ^~~
builtin/gc.c:234:2: error: this 'if' clause does not guard... 
[-Werror=misleading-indentation]
  if (GlobalMemoryStatusEx())
  ^~
builtin/gc.c:235:18: note: ...this statement, but the latter is misleadingly 
indented as if it were guarded by the 'if'
   return memInfo;ullTotalPhys;
  ^~~~
builtin/gc.c:235:18: error: 'ullTotalPhys' undeclared (first use in this 
function)
builtin/gc.c:235:18: note: each undeclared identifier is reported only once for 
each function it appears in

I suspect that the first semicolon wanted to be a period instead. At least
it fixes the build here (that's all I can test, I'm at GitMerge and miss a
very interesting discussion about the serialized commit graph to write
this):

-- snip --
diff --git a/builtin/gc.c b/builtin/gc.c
index 4f46a99ab54..9c12f1ee9af 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -232,7 +232,7 @@ static inline unsigned long total_ram(void)
 
memInfo.dwLength = sizeof(MEMORYSTATUSEX);
if (GlobalMemoryStatusEx())
-   return memInfo;ullTotalPhys;
+   return memInfo.ullTotalPhys;
 #else
fprintf(stderr, _("unrecognized platform, assuming %lu GB RAM\n"),
default_ram);

-- snap --

Junio, may I ask you to put this into a SQUASH??? commit so that the
Windows build no longer fails?

Thanks,
Dscho

Re: [PATCH v2 2/5] repack: add --keep-pack option

2018-03-07 Thread Duy Nguyen
On Wed, Mar 7, 2018 at 1:25 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> +--keep-pack=::
>> + Ignore the given pack. This is the equivalent of having
>> + `.keep` file on the pack. Implies `--honor-pack-keep`.
>> +
>
> A few questions I am not sure how I would answer:
>
>  - Do we want to have this listed in the SYNOPSIS section, too?
>
>  - We would want to make the SP in "" consistent with
>the dash in "" in the same document; which way do
>we make it uniform?

Probably the latter.

>
>  - Is this description clear enough to convey that we allow more
>than one instance of this option specified, and the pack names
>accumulate?

If a question is raised, it's probably not clear.

>  - Are there use cases where we want to _ignore_ on-disk ".keep" and
>only honor the ones given via the "--keep-pack" options?

I can't think of one. These .keep files are originally lock files and
ignoring them sounds like a bad idea. Perhaps we could add
--no-keep-pack later to explicit not keep a pack, ignoring .keep file
if present?

>> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
>> index 38247afbec..553d907d34 100755
>> --- a/t/t7700-repack.sh
>> +++ b/t/t7700-repack.sh
>> @@ -196,5 +196,24 @@ test_expect_success 'objects made unreachable by grafts 
>> only are kept' '
>>   git cat-file -t $H1
>>  '
>>
>> +test_expect_success 'repack --keep-pack' '
>> + test_create_repo keep-pack &&
>> + (
>> + cd keep-pack &&
>> + for cmit in one two three four; do
>> + test_commit $cmit &&
>> + git repack -d
>> + done &&
>
> Style: replace "; " before do with LF followed by a few HT.
>
> This 'for' loop would not exit and report error if an early
> test_commit or "git repack -d" fails, would it?

Yeah. I guess I'll just unroll the loop.

>> + git repack -a -d --keep-pack $KEEP1 --keep-pack $KEEP4 &&
>> + ls .git/objects/pack/*.pack >new-counts &&
>> + test_line_count = 3 new-counts &&
>> + git fsck
>
> One important invariant for this new feature is that $KEEP1 and
> $KEEP4 will both appear in new-counts file, no?  Rename new-counts
> to new-pack-list and inspect the contents, not just line count,
> perhaps?

OK
-- 
Duy


Re: Worktree silently deleted on git fetch/gc/log

2018-03-07 Thread Duy Nguyen
On Sat, Mar 3, 2018 at 5:05 PM, Eric Sunshine  wrote:
> I can't presently think of a reason why gitdir needed/used an absolute
> or normalized path. Was it because there was some need to compare such
> paths?

No, we need to re-normalize paths before comparing anyway to be safe.
I don't think I had any special reason for using absolute path either,
except that dealing with it is a bit easier than relative one.

>> If we stored relative path in ".git" file at least, the worktree would
>> immediately fail after the user moves either the linked checkout, or
>> the main one. This should send a loud and clear signal to the user
>> "something has gone horribly" and hopefully make them connect it to
>> the last rename and undo that. "git gc" would have near zero chance to
>> kick in and destroy stale worktrees.
>
> It would detect if the main or linked worktree moved up or down one or
> more directory levels or elsewhere.
>
> It would not detect if the worktree directory was merely renamed.

True. We can be a bit more aggressive and check it anyway at command
startup (once we can peek into the main .git dir, we know where this
linked worktree is supposed to be). Or perhaps we can do this in "gc
--auto" (even though gc may end up not doing any maintenance job).
This command is usually called on heavy commands, adding one more
check should not hurt.

> Still, detecting some cases of breakage early may be better than not
> detecting breakage at all.
>
> Another idea may be to store the worktree's own normalized/absolute
> path as a second line in its .git file. It could then immediately
> detect any manual movement or renaming of itself. A minor concern,
> though, is if there are any external tools reading that file directly
> since they could be confused by the second line. Of course, such tools
> hopefully would be using "git rev-parse --git-dir", so maybe it
> wouldn't be a big deal. On the other hand, older versions of Git
> itself would be confused by the second line, so perhaps the idea isn't
> viable.

Yeah I'm a bit hesitant to break that file format. Many tools ignore
our programs and peek inside anyway, I think magit is one of them.
-- 
Duy


Re: [PATCH] xdiff: improve trimming preprocessing

2018-03-07 Thread Eric Sunshine
On Tue, Mar 6, 2018 at 6:05 PM, Jun Wu  wrote:
> Excerpts from Eric Sunshine's message of 2018-03-06 14:23:46 -0500:
>> On Tue, Mar 6, 2018 at 6:53 AM, Jun Wu  wrote:
>> > +  printf "x%.0s" {1..934} >>d # pad common suffix to 1024 bytes
>>
>> The expression {x..y} is not portable to non-POSIX shells.
>
> Is there a recommended way to generate a repetitive string?
> Maybe `seq 1000 | sed 's/.*/x/'` ?

That seems reasonable, although you'd want to use the test suite's
more portable test_seq rather than seq.

>> > +   /* prefix - need line count for xecfg->ptrimmed */
>> > +   for (i = 0; ++i < smaller && *ap == *bp;) {
>> > +   lines += (*ap == '\n');
>> > +   ap++, bp++;
>>
>> Is there a good reason for not placing 'ap++, bp++' directly in the 'for'?
>
> "lines += (*ap == '\n');" needs the "ap" before adding. Alternatives are
>
> for (i = 0; ++i < smaller && *ap == *bp; ) /* 1 */
> lines += (*ap++, *bp++) == '\n';
>
> for (i = 0; ++i < smaller && *ap == *bp; ap++, bp++) /* 2 */
> lines += (*(ap - 1) == '\n');
>
> Maybe will pick /* 1 */ to make the code shorter.

I was thinking specifically about #2 when asking the question. The
reason I asked is that, as one reading the code, the
not-quite-idiomatic loop made me stop and wonder if something special
was going on which wasn't obvious at a glance.

And, apparently, I'm still wondering since I'm not groking what you mean by:

"lines += (*ap == '\n');" needs the "ap" before adding

It's quite possible that I'm being dense and overlooking something
blindingly obvious.


Greetings From Mr. Hamidou Kader.

2018-03-07 Thread Hamidou Kader
Greetings From Mr. Hamidou Kader.

I have a Mutual/Beneficial Business Project that would be beneficial to you. I 
only have two questions to ask of you, if you are interested.

1. Can you handle this project?
2. Can I give you this trust?

Please note that the deal requires high level of maturity, honesty and secrecy. 
This will involve moving some money from my office, on trust to your hands or 
bank account. Also note that i will do everything to make sure that the money 
is moved as a purely legitimate fund, so you will not be exposed to any risk.

I request for your full co-operation. I will give you details and procedure 
when I receive your reply, to commence this transaction, I require you to 
immediately indicate your interest by a return reply.
Contact Email:  hamidoukad...@gmail.com

I will be waiting for your response in a timely manner.
Best Regard,
Mr. Hamidou Kader