Antw: Re: Terrible bad performance for it blame --date=iso -C -C master --

2017-05-03 Thread Ulrich Windl
>>> Samuel Lijin  schrieb am 03.05.2017 um 09:12 in Nachricht
:
> On Mon, May 1, 2017 at 7:59 PM, Junio C Hamano  wrote:
>> Samuel Lijin  writes:
>>
>>> On Fri, Mar 31, 2017 at 10:52 AM, Junio C Hamano  wrote:
>>>
 It might not be a bad idea to teach "blame" not to pay attention to
 any path that is marked as "-diff" (e.g. binary files) when trying
 to see if remaining contents appeared by borrowing from them.  We do
 not have that heuristics (yet).
>>>
>>> Could you elaborate on this? Do you mean to tell diffcore-rename to
>>> ignore diff_filespec objects if they're binary?
>>
>> No and yes ;-).  I do not think it is a good idea to unconditionally
>> ignore binary in diffcore-rename.
>>
>> But when we know that the rename detection is called from inside
>> blame.c, where by definition we would be digging line-oriented
>> contents, there is no point checking if the contents we are looking
>> for came from an existing binary file.
> 
> A followup thought: perhaps it would make sense for blame.c rename
> detection to only consider files with the same extension?

As traditionally file type and file name extension had little to do with each 
other in UNIX, I thing that would not be a good solution (to assume the 
opposite). I also know that Git does not care too much about hierarchies also, 
but my idea was so find candidates "nearby", i.e. look in the subdirectories of 
the first level and in the parent directory; then, if needed, widen the radius 
of the search...

Regards,
Ulrich







Re: Terrible bad performance for it blame --date=iso -C -C master --

2017-05-03 Thread Samuel Lijin
On Mon, May 1, 2017 at 7:59 PM, Junio C Hamano  wrote:
> Samuel Lijin  writes:
>
>> On Fri, Mar 31, 2017 at 10:52 AM, Junio C Hamano  wrote:
>>
>>> It might not be a bad idea to teach "blame" not to pay attention to
>>> any path that is marked as "-diff" (e.g. binary files) when trying
>>> to see if remaining contents appeared by borrowing from them.  We do
>>> not have that heuristics (yet).
>>
>> Could you elaborate on this? Do you mean to tell diffcore-rename to
>> ignore diff_filespec objects if they're binary?
>
> No and yes ;-).  I do not think it is a good idea to unconditionally
> ignore binary in diffcore-rename.
>
> But when we know that the rename detection is called from inside
> blame.c, where by definition we would be digging line-oriented
> contents, there is no point checking if the contents we are looking
> for came from an existing binary file.

A followup thought: perhaps it would make sense for blame.c rename
detection to only consider files with the same extension?


Re: Terrible bad performance for it blame --date=iso -C -C master --

2017-05-01 Thread Junio C Hamano
Samuel Lijin  writes:

> On Fri, Mar 31, 2017 at 10:52 AM, Junio C Hamano  wrote:
>
>> It might not be a bad idea to teach "blame" not to pay attention to
>> any path that is marked as "-diff" (e.g. binary files) when trying
>> to see if remaining contents appeared by borrowing from them.  We do
>> not have that heuristics (yet).
>
> Could you elaborate on this? Do you mean to tell diffcore-rename to
> ignore diff_filespec objects if they're binary?

No and yes ;-).  I do not think it is a good idea to unconditionally
ignore binary in diffcore-rename.

But when we know that the rename detection is called from inside
blame.c, where by definition we would be digging line-oriented
contents, there is no point checking if the contents we are looking
for came from an existing binary file.


Re: Terrible bad performance for it blame --date=iso -C -C master --

2017-05-01 Thread Samuel Lijin
On Fri, Mar 31, 2017 at 10:52 AM, Junio C Hamano  wrote:
> "Ulrich Windl"  writes:
>
>> I was running "vc-annotate" in Emacs for a file from a large
>> repository (>4 files, a big percentage being binary, about 10
>> commits). For the first file the result was presented rather soon, but
>> for a second file the command did not finish even after about 10
>> minutes!
>>
>> The file in question is a rather short text file (124 kB), and
>> according to git log it has one commit.
>>
>> While being bored, I did an strace of the command to find out that a
>> huge number of files is inspected.
>
> With -C -C the user (vc-annotate?) is asking to inspect huge number
> of files, to find if the contents of the file (except for the part
> that came from its own previous version) came from other existing
> files.  So this is very much expected.
>
> It might not be a bad idea to teach "blame" not to pay attention to
> any path that is marked as "-diff" (e.g. binary files) when trying
> to see if remaining contents appeared by borrowing from them.  We do
> not have that heuristics (yet).

Could you elaborate on this? Do you mean to tell diffcore-rename to
ignore diff_filespec objects if they're binary?


Antw: Re: Terrible bad performance for it blame --date=iso -C

2017-04-03 Thread Ulrich Windl
>>> Jakub Narebski  schrieb am 03.04.2017 um 17:16 in
Nachricht
<0ccc5cab-26b7-4b02-b964-452b61e92...@gmail.com>:
> W dniu 03.04.2017 o 12:56, SZEDER Gábor pisze:
>> Ulrich Windl wrote:
> 
>>> In the other case (for the user bored of waiting seeking for some
>>> entertainment ;-)) a "-v (verbose) option could be useful.  Or at the
>>> very least: If git is expecting that some operation will take (or
>>> already did take) a lot of time, give some message explaining why it
>>> is taking a lot of time, and maybe how to avoid that.
>> 
>> It already does so by default since v2.8.0, see aba37f495 (blame: add
>> support for --[no-]progress option, 2015-12-12).
>> 
>>   $ time git blame sha1_file.c |wc -l
>>   4026
>>   
>>   real0m1.744s
>>   user0m1.672s
>>   sys 0m0.068s
>>   $ time git blame -C -C sha1_file.c |wc -l
>>   Blaming lines: 100% (4026/4026), done.
>>   4026
>>   
>>   real0m3.832s
>>   user0m3.716s
>>   sys 0m0.112s
>> 
>> However, after a short peek at that commit, it only displays progress
>> by default when stderr is a terminal, which might not be the case when
>> invoked from emacs.
> 
> Emacs (magit?) should use `git blame --porcelain`, and do its own
> progress report, just like 'git gui blame' and incremental blame mode
> of gitweb.

I was thinking similar: The pain vc-annotate obviously should work without
those "-C" options, and with prefix argument (C-u  in Emacs) it could
start looking for copied stuff. HMO...

Worse than no progress reporting is the inability to kill the process (if you
run out of patience) with C-g (that stops most commands in Emacs).

> 
> Actually... there already is git-blamed - Minor mode for incremental
> blame for Git, and mo-git-blame - An interactive, iterative 'git blame'
> mode for Emacs, both available on ELPA (Emacs Lisp Package Archive).

I confess taht is still use RCS from time to time, and I prefer the
higher-level Emacs commands ;-)

Regards,
Ulrich



Re: Terrible bad performance for it blame --date=iso -C

2017-04-03 Thread Jakub Narębski
W dniu 03.04.2017 o 12:56, SZEDER Gábor pisze:
> Ulrich Windl wrote:

>> In the other case (for the user bored of waiting seeking for some
>> entertainment ;-)) a "-v (verbose) option could be useful.  Or at the
>> very least: If git is expecting that some operation will take (or
>> already did take) a lot of time, give some message explaining why it
>> is taking a lot of time, and maybe how to avoid that.
> 
> It already does so by default since v2.8.0, see aba37f495 (blame: add
> support for --[no-]progress option, 2015-12-12).
> 
>   $ time git blame sha1_file.c |wc -l
>   4026
>   
>   real0m1.744s
>   user0m1.672s
>   sys 0m0.068s
>   $ time git blame -C -C sha1_file.c |wc -l
>   Blaming lines: 100% (4026/4026), done.
>   4026
>   
>   real0m3.832s
>   user0m3.716s
>   sys 0m0.112s
> 
> However, after a short peek at that commit, it only displays progress
> by default when stderr is a terminal, which might not be the case when
> invoked from emacs.

Emacs (magit?) should use `git blame --porcelain`, and do its own
progress report, just like 'git gui blame' and incremental blame mode
of gitweb.

Actually... there already is git-blamed - Minor mode for incremental
blame for Git, and mo-git-blame - An interactive, iterative 'git blame'
mode for Emacs, both available on ELPA (Emacs Lisp Package Archive).

HTH
-- 
Jakub Narębski



Re: Terrible bad performance for it blame --date=iso -C

2017-04-03 Thread SZEDER Gábor
> In the other case (for the user bored of waiting seeking for some
> entertainment ;-)) a "-v (verbose) option could be useful.  Or at the
> very least: If git is expecting that some operation will take (or
> already did take) a lot of time, give some message explaining why it
> is taking a lot of time, and maybe how to avoid that.

It already does so by default since v2.8.0, see aba37f495 (blame: add
support for --[no-]progress option, 2015-12-12).

  $ time git blame sha1_file.c |wc -l
  4026
  
  real0m1.744s
  user0m1.672s
  sys 0m0.068s
  $ time git blame -C -C sha1_file.c |wc -l
  Blaming lines: 100% (4026/4026), done.
  4026
  
  real0m3.832s
  user0m3.716s
  sys 0m0.112s

However, after a short peek at that commit, it only displays progress
by default when stderr is a terminal, which might not be the case when
invoked from emacs.



Antw: Re: Terrible bad performance for it blame --date=iso -C -C master --

2017-04-02 Thread Ulrich Windl
Junio,

thanks for explaining! So if there are at least two commits, blame is fast, but 
with only one commit blame tries hard to find another commit that might have 
contributed to the one file?
I verified that without those "-C" options the result is very quick.
In the other case (for the user bored of waiting seeking for some entertainment 
;-)) a "-v (verbose) option could be useful.
Or at the very least: If git is expecting that some operation will take (or 
already did take) a lot of time, give some message explaining why it is taking 
a lot of time, and maybe how to avoid that.

Regards,
Ulrich


>>> Junio C Hamano  schrieb am 31.03.2017 um 17:52 in 
>>> Nachricht
:
> "Ulrich Windl"  writes:
> 
>> I was running "vc-annotate" in Emacs for a file from a large
>> repository (>4 files, a big percentage being binary, about 10
>> commits). For the first file the result was presented rather soon, but
>> for a second file the command did not finish even after about 10
>> minutes!
>>
>> The file in question is a rather short text file (124 kB), and
>> according to git log it has one commit.
>>
>> While being bored, I did an strace of the command to find out that a
>> huge number of files is inspected.
> 
> With -C -C the user (vc-annotate?) is asking to inspect huge number
> of files, to find if the contents of the file (except for the part
> that came from its own previous version) came from other existing
> files.  So this is very much expected.
> 
> It might not be a bad idea to teach "blame" not to pay attention to
> any path that is marked as "-diff" (e.g. binary files) when trying
> to see if remaining contents appeared by borrowing from them.  We do
> not have that heuristics (yet).






Re: Terrible bad performance for it blame --date=iso -C -C master --

2017-03-31 Thread Samuel Lijin
Hi Ulrich,

Is there any chance you could share the repo where this is coming from?

This is actually something a colleague and I are looking into seeing
if we can crunch out some performance gains since -C -C isn't
threaded.

Sam

On Fri, Mar 31, 2017 at 10:52 AM, Junio C Hamano  wrote:
> "Ulrich Windl"  writes:
>
>> I was running "vc-annotate" in Emacs for a file from a large
>> repository (>4 files, a big percentage being binary, about 10
>> commits). For the first file the result was presented rather soon, but
>> for a second file the command did not finish even after about 10
>> minutes!
>>
>> The file in question is a rather short text file (124 kB), and
>> according to git log it has one commit.
>>
>> While being bored, I did an strace of the command to find out that a
>> huge number of files is inspected.
>
> With -C -C the user (vc-annotate?) is asking to inspect huge number
> of files, to find if the contents of the file (except for the part
> that came from its own previous version) came from other existing
> files.  So this is very much expected.
>
> It might not be a bad idea to teach "blame" not to pay attention to
> any path that is marked as "-diff" (e.g. binary files) when trying
> to see if remaining contents appeared by borrowing from them.  We do
> not have that heuristics (yet).


Re: Terrible bad performance for it blame --date=iso -C -C master --

2017-03-31 Thread Junio C Hamano
"Ulrich Windl"  writes:

> I was running "vc-annotate" in Emacs for a file from a large
> repository (>4 files, a big percentage being binary, about 10
> commits). For the first file the result was presented rather soon, but
> for a second file the command did not finish even after about 10
> minutes!
>
> The file in question is a rather short text file (124 kB), and
> according to git log it has one commit.
>
> While being bored, I did an strace of the command to find out that a
> huge number of files is inspected.

With -C -C the user (vc-annotate?) is asking to inspect huge number
of files, to find if the contents of the file (except for the part
that came from its own previous version) came from other existing
files.  So this is very much expected.

It might not be a bad idea to teach "blame" not to pay attention to
any path that is marked as "-diff" (e.g. binary files) when trying
to see if remaining contents appeared by borrowing from them.  We do
not have that heuristics (yet).