Re: [PATCH] revset: introduce the summary predicate

2017-01-16 Thread Yuya Nishihara
On Wed, 11 Jan 2017 23:15:28 +0900, Yuya Nishihara wrote:
> On Tue, 10 Jan 2017 20:09:45 -0500, Matt Harbison wrote:
> > >> I've missed that '{firstline}' proposal from Sean, can you point me at  
> > >> it? (or summarize it ?)
> > >
> > > Not a proposal, so much as historical knowledge about the template?
> > >
> > > https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-January/092099.html
> 
> I thought that the proposal was something like
> search_in_template_output('re:issue\d+', '{desc|firstline'}').

For future readers, this could be a security risk (without safe=False),
because:

 a) hgweb supports most revsets (which are marked safe=True.)
 b) template may expose more information than revset knows (e.g. we might
add {env} keyword)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] revset: introduce the summary predicate

2017-01-12 Thread Yuya Nishihara
On Thu, 12 Jan 2017 00:40:36 -0500, Matt Harbison wrote:
> On Wed, 11 Jan 2017 09:15:28 -0500, Yuya Nishihara  wrote:
> > I also like the idea of deprecating
> > grep() since grep() sounds like searching the file contents.
> 
> I don't have a strong feeling on that.  If somebody makes a new search  
> function though, I wonder if it should be like revset.matching() (but with  
> stringmatcher support), where the user can control the fields searched, in  
> order to avoid this sort of ambiguity.  I wouldn't want to fold grep()  
> into author() because of the clashing case sensitive/insensitive you  
> mention below.

Fair enough. (and my feeling wasn't strong neither.)

> >> That leaves only 'author' and 'desc' as not providing the full
> >> functionality of the others.
> >>
> >> > C) If we do A + B, that means 'desc' is the only oddball left.  I  
> >> don't
> >> > like the idea that case sensitivity for a raw pattern and a 'literal:'
> >> > prefixed pattern would differ.  They are both literals in my mind, and
> >> > it would be the one remaining exception.  The 're:' prefix could  
> >> follow
> >> > regular rules.
> >
> > I won't insist that 'literal:' must be case-sensitive (because of (A).)
> > However, I would guess 're:' is also case-insensitive if 'literal:' is.
> > In my mindset, desc() is a case-insensitive matcher in that case.
> >
> > So I lean towards adding case-insensitive desc('re:'), which would be at  
> > least
> > consistent in that desc() always ignores cases.
> 
> The only reason I would guess 're:' is case sensitive, is because I've  
> never run into one that hasn't been.  I do like the consistency argument  
> though, so let's try that.  I wonder if in addition to the  
> 'icase-literal:' you suggested, if this also means we need a 'case-re:',  
> since it doesn't look like you can force re.I off.  I don't see any real  
> benefit for author(), but I can maybe see it for desc().  The series I'm  
> about to submit hints at the ability to add these with one or two lines.   
> See what you think.

Honestly I just postponed the consideration about case-sensitive desc() by
this design. ;) I'm not a fan of 'case-re:' because 're:' is case-sensitive
in most revset functions. I'd rather add new case-sensitive desc() or more
general function.

I found (?-i:...) syntax, but that's Python 3.6 thing, sigh.

https://docs.python.org/3/library/re.html
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] revset: introduce the summary predicate

2017-01-11 Thread Matt Harbison

On Wed, 11 Jan 2017 09:15:28 -0500, Yuya Nishihara  wrote:


On Tue, 10 Jan 2017 20:09:45 -0500, Matt Harbison wrote:
On Mon, 09 Jan 2017 23:33:17 -0500, Matt Harbison  


wrote:
> On Mon, 09 Jan 2017 05:49:23 -0500, Pierre-Yves David
>> * predicate
>> * default behavior
>> * support 'rich' stringmatcher ?
>> * are 'literal:' case sensitive ?
>> * are 're:' case sensitive (and supported at all) ?
>
> TL;DR: desc, grep, keyword, and author/user are the oddballs.  grep  
and

> keyword are the regex and literal halves respectively, of the same
> search.
>
> After stripping out non string predicates, we basically end up with 4
> groups:
>
> - util.stringmatcher based:
>
>  Predicate:Case Sensitive?
>  "author"   N
>  "user" N
>  "bookmark" Y
>  "branch"   Y
>  "extra"Y
>  "named"Y
>  "subrepo"  [1] Y
>  "tag"  Y
>
> These all support 'literal:' and 're:'.  Case sensitivity applies the
> same to both prefixes, and raw pattern.
>
> [1] Not documented to support 'literal:' or 're:' prefixes.
>
>
> - Local method implementation based:
>
>  Predicate:Case Sensitive?
>  "desc" N
>  "grep" Y
>  "keyword"  N
>
> None of these support prefixes.  The grep param is a regex without
> 're:', so it doesn't make a lot of sense to support stringmatcher  
here-
> what stringmatcher thinks is literal is really regex.  If we  
internally

> bolt on 're:', it still can't support literal matches.
>
>
> - match.py based (not relevant, but for completeness):
>
>  "adds", "contains", "file", "filelog", "follow",
>  "followlines", "modifies", "removes"
>
> - "bisect" (I can't see how 're:' support here would be meaningful.)
>
>
>>  From there we'll be able to see if a pattern emerge and pick the  
best

>> way to move forward.


Thanks for the nice summary. Inline comments follow.


> The following thoughts come to mind:
>
> A) author/user has been thoroughly corrupted with the lower casing.
> Maybe come up with a 3rd one that follows modern rules, and
> deprecate/hide these?  Sad, because these seem natural.  OTOH, having
> raw pattern and prefixed patterns behave the same everywhere is  
elegant

> and simple.  Not sure what to call it.  'committer' comes to mind, but
> that has git implications.  That was suggested when I proposed  
recording
> who performed a graft in the 'extra' dict.  Then it was pointed out  
this
> tracking was related to the chain of custody proposal by Greg, so I  
let
> it drop [1].  My enthusiasm was dampened some when I saw templates  
also
> have {author} and {user}, though the latter is a filter, not a  
reference

> to the field.
>
> [1]
>  
https://www.mercurial-scm.org/pipermail/mercurial-devel/2015-April/068692.html


Yeah, that would be unfortunate to deprecate author/user() since they  
are very
common terms. Also, adding case-sensitive author() would not provide  
much user

benefit even though that could help eliminating the inconsistency.

So I'm -1 for (A).


Yeah... And given what Augie said, I can agree with that.

On the other hand, I think we should fix the re: case to not lowercase  
the

pattern but to perform case-insensitive matching so '\B' won't be '\b'.


> B) We should maybe fold grep and keyword into a new predicate
> ('search'?) that follows modern formatting, and deprecate/hide these
> two.  Simple, and drops the visible revset count by 1.  I'm not sure  
if

> 'grep' was borrowed  from git, or if it's just inspired by the unix
> command.

I had second thoughts about this.  I think if we just add documentation  
to

'keyword' that says "If you want to search these fields by regex or case
sensitively, use 'grep'".  Then the user can see how to get everything
stringmatcher will provide, and it makes it clear to developers that
'keyword' doesn't need string stringmatcher support.  I'll submit a  
patch

during the freeze.


The documentation thing sounds nice.


I threw it into this next series, since it's pretty trivial.


I also like the idea of deprecating
grep() since grep() sounds like searching the file contents.


I don't have a strong feeling on that.  If somebody makes a new search  
function though, I wonder if it should be like revset.matching() (but with  
stringmatcher support), where the user can control the fields searched, in  
order to avoid this sort of ambiguity.  I wouldn't want to fold grep()  
into author() because of the clashing case sensitive/insensitive you  
mention below.



That leaves only 'author' and 'desc' as not providing the full
functionality of the others.

> C) If we do A + B, that means 'desc' is the only oddball left.  I  
don't

> like the idea that case sensitivity for a raw pattern and a 'literal:'
> prefixed pattern would 

Re: [PATCH] revset: introduce the summary predicate

2017-01-11 Thread Yuya Nishihara
On Tue, 10 Jan 2017 20:09:45 -0500, Matt Harbison wrote:
> On Mon, 09 Jan 2017 23:33:17 -0500, Matt Harbison   
> wrote:
> > On Mon, 09 Jan 2017 05:49:23 -0500, Pierre-Yves David  
> >> * predicate
> >> * default behavior
> >> * support 'rich' stringmatcher ?
> >> * are 'literal:' case sensitive ?
> >> * are 're:' case sensitive (and supported at all) ?
> >
> > TL;DR: desc, grep, keyword, and author/user are the oddballs.  grep and  
> > keyword are the regex and literal halves respectively, of the same  
> > search.
> >
> > After stripping out non string predicates, we basically end up with 4  
> > groups:
> >
> > - util.stringmatcher based:
> >
> >  Predicate:Case Sensitive?
> >  "author"   N
> >  "user" N
> >  "bookmark" Y
> >  "branch"   Y
> >  "extra"Y
> >  "named"Y
> >  "subrepo"  [1] Y
> >  "tag"  Y
> >
> > These all support 'literal:' and 're:'.  Case sensitivity applies the  
> > same to both prefixes, and raw pattern.
> >
> > [1] Not documented to support 'literal:' or 're:' prefixes.
> >
> >
> > - Local method implementation based:
> >
> >  Predicate:Case Sensitive?
> >  "desc" N
> >  "grep" Y
> >  "keyword"  N
> >
> > None of these support prefixes.  The grep param is a regex without  
> > 're:', so it doesn't make a lot of sense to support stringmatcher here-  
> > what stringmatcher thinks is literal is really regex.  If we internally  
> > bolt on 're:', it still can't support literal matches.
> >
> >
> > - match.py based (not relevant, but for completeness):
> >
> >  "adds", "contains", "file", "filelog", "follow",
> >  "followlines", "modifies", "removes"
> >
> > - "bisect" (I can't see how 're:' support here would be meaningful.)
> >
> >
> >>  From there we'll be able to see if a pattern emerge and pick the best  
> >> way to move forward.

Thanks for the nice summary. Inline comments follow.

> > The following thoughts come to mind:
> >
> > A) author/user has been thoroughly corrupted with the lower casing.   
> > Maybe come up with a 3rd one that follows modern rules, and  
> > deprecate/hide these?  Sad, because these seem natural.  OTOH, having  
> > raw pattern and prefixed patterns behave the same everywhere is elegant  
> > and simple.  Not sure what to call it.  'committer' comes to mind, but  
> > that has git implications.  That was suggested when I proposed recording  
> > who performed a graft in the 'extra' dict.  Then it was pointed out this  
> > tracking was related to the chain of custody proposal by Greg, so I let  
> > it drop [1].  My enthusiasm was dampened some when I saw templates also  
> > have {author} and {user}, though the latter is a filter, not a reference  
> > to the field.
> >
> > [1]  
> > https://www.mercurial-scm.org/pipermail/mercurial-devel/2015-April/068692.html

Yeah, that would be unfortunate to deprecate author/user() since they are very
common terms. Also, adding case-sensitive author() would not provide much user
benefit even though that could help eliminating the inconsistency.

So I'm -1 for (A).

On the other hand, I think we should fix the re: case to not lowercase the
pattern but to perform case-insensitive matching so '\B' won't be '\b'.

> > B) We should maybe fold grep and keyword into a new predicate  
> > ('search'?) that follows modern formatting, and deprecate/hide these  
> > two.  Simple, and drops the visible revset count by 1.  I'm not sure if  
> > 'grep' was borrowed  from git, or if it's just inspired by the unix  
> > command.
> 
> I had second thoughts about this.  I think if we just add documentation to  
> 'keyword' that says "If you want to search these fields by regex or case  
> sensitively, use 'grep'".  Then the user can see how to get everything  
> stringmatcher will provide, and it makes it clear to developers that  
> 'keyword' doesn't need string stringmatcher support.  I'll submit a patch  
> during the freeze.

The documentation thing sounds nice. I also like the idea of deprecating
grep() since grep() sounds like searching the file contents.

> That leaves only 'author' and 'desc' as not providing the full  
> functionality of the others.
> 
> > C) If we do A + B, that means 'desc' is the only oddball left.  I don't  
> > like the idea that case sensitivity for a raw pattern and a 'literal:'  
> > prefixed pattern would differ.  They are both literals in my mind, and  
> > it would be the one remaining exception.  The 're:' prefix could follow  
> > regular rules.

I won't insist that 'literal:' must be case-sensitive (because of (A).)
However, I would guess 're:' is also case-insensitive if 'literal:' is.
In my mindset, desc() is a case-insensitive matcher in that case.

So I lean towards adding 

Re: [PATCH] revset: introduce the summary predicate

2017-01-10 Thread Matt Harbison
On Mon, 09 Jan 2017 23:33:17 -0500, Matt Harbison   
wrote:


On Mon, 09 Jan 2017 05:49:23 -0500, Pierre-Yves David  
 wrote:





On 01/08/2017 09:34 PM, Matt Harbison wrote:

On Sun, 08 Jan 2017 07:59:36 -0500, Pierre-Yves David
 wrote:


(ha, I wrote my previous reply in a train and it got sent when I
connected again (and received that one). I'm going to try to adress
the new content in this email and sometime repeat some of my other
reply content for clarity)

On 01/08/2017 04:23 AM, Matt Harbison wrote:

On Sat, 07 Jan 2017 02:56:48 -0500, Yuya Nishihara 
wrote:


On Fri, 6 Jan 2017 21:29:43 -0500, Matt Harbison wrote:

> On Jan 6, 2017, at 11:19 AM, Pierre-Yves David
 wrote:
>> On 01/04/2017 07:04 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison 
>> # Date 1483550016 18000
>> #  Wed Jan 04 12:13:36 2017 -0500
>> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
>> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
>> revset: introduce the summary predicate
>>



Perhaps stringmatcher can have 3 types, icase literal, literal, and
re, and
the default of desc() is icase literal for backward compatibility.  
You

can
build a case-insensitive regexp object from a literal pattern.

https://docs.python.org/2/library/re.html#re.I


Yep, that's the API I was thinking of.

I'm confused by the rest of your comments.  When I first skimmed your
message, adding support for 'icasere:' using this API popped into my
mind.  And that could support a case insensitive literal, because
'icasere:foo' should be equivalent to looking for the substring 'foo'
(leaving aside efficiency, how discoverable that is, and that
stringmatcher matches the whole string for literals).  But you seem  
to

be suggesting adding 'icaseliteral:'.


I'm not 100% sure of what Yuya actually has in mind but here is my
understanding of the situation and how we could move forward.

Currently:
--

   desc(X) → X is customly matched as a case insensitive litteral,

   We have a "generic" pattern definition syntax used by various other
reveset (implemented in "stringmatcher")

 foo(X)
   → X is matched as a case sensitive litteral
 foo('literal:X')
   → X is matched as a case sensitive literal (same as the above)
 food('re:X')
   → X is matched as a regular expression (case sensitive)

Proposal: (might be what yuya says)
-

extend the string matcher to

   foo('literal:X')
 → X is matched as a case sensitive literal


See the comment in the new patch I sent about 'user()' already
lowercasing 'literal:' and 're:'.  I'd consider it a bug, but it's been
in since mid 2012.  Attempting to channel Matt, I'm guessing we are
stuck with that since it is so old, but wanted to see what others  
think.


1) Yep, we are stuck with whatever existing behavior we have for  
existing predicate because of BC. (but we can augment it)


2) Congratulation you seems to have unearthed an area where we have  
many predicated with close but slightly different behavior. At that  
point I'll ask you an inventory of what we currently have so that we  
can devise a sound and as consistent as possible way forward.


   Can you provide us with a table that at least keep track of:

* predicate
* default behavior
* support 'rich' stringmatcher ?
* are 'literal:' case sensitive ?
* are 're:' case sensitive (and supported at all) ?


TL;DR: desc, grep, keyword, and author/user are the oddballs.  grep and  
keyword are the regex and literal halves respectively, of the same  
search.


After stripping out non string predicates, we basically end up with 4  
groups:


- util.stringmatcher based:

 Predicate:Case Sensitive?
 "author"   N
 "user" N
 "bookmark" Y
 "branch"   Y
 "extra"Y
 "named"Y
 "subrepo"  [1] Y
 "tag"  Y

These all support 'literal:' and 're:'.  Case sensitivity applies the  
same to both prefixes, and raw pattern.


[1] Not documented to support 'literal:' or 're:' prefixes.


- Local method implementation based:

 Predicate:Case Sensitive?
 "desc" N
 "grep" Y
 "keyword"  N

None of these support prefixes.  The grep param is a regex without  
're:', so it doesn't make a lot of sense to support stringmatcher here-  
what stringmatcher thinks is literal is really regex.  If we internally  
bolt on 're:', it still can't support literal matches.



- match.py based (not relevant, but for completeness):

 "adds", "contains", "file", "filelog", "follow",
 "followlines", "modifies", "removes"

- "bisect" (I can't see how 're:' support here would be meaningful.)


 From there 

Re: [PATCH] revset: introduce the summary predicate

2017-01-09 Thread Matt Harbison
On Mon, 09 Jan 2017 05:49:23 -0500, Pierre-Yves David  
 wrote:





On 01/08/2017 09:34 PM, Matt Harbison wrote:

On Sun, 08 Jan 2017 07:59:36 -0500, Pierre-Yves David
 wrote:


(ha, I wrote my previous reply in a train and it got sent when I
connected again (and received that one). I'm going to try to adress
the new content in this email and sometime repeat some of my other
reply content for clarity)

On 01/08/2017 04:23 AM, Matt Harbison wrote:

On Sat, 07 Jan 2017 02:56:48 -0500, Yuya Nishihara 
wrote:


On Fri, 6 Jan 2017 21:29:43 -0500, Matt Harbison wrote:

> On Jan 6, 2017, at 11:19 AM, Pierre-Yves David
 wrote:
>> On 01/04/2017 07:04 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison 
>> # Date 1483550016 18000
>> #  Wed Jan 04 12:13:36 2017 -0500
>> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
>> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
>> revset: introduce the summary predicate
>>



Perhaps stringmatcher can have 3 types, icase literal, literal, and
re, and
the default of desc() is icase literal for backward compatibility.  
You

can
build a case-insensitive regexp object from a literal pattern.

https://docs.python.org/2/library/re.html#re.I


Yep, that's the API I was thinking of.

I'm confused by the rest of your comments.  When I first skimmed your
message, adding support for 'icasere:' using this API popped into my
mind.  And that could support a case insensitive literal, because
'icasere:foo' should be equivalent to looking for the substring 'foo'
(leaving aside efficiency, how discoverable that is, and that
stringmatcher matches the whole string for literals).  But you seem to
be suggesting adding 'icaseliteral:'.


I'm not 100% sure of what Yuya actually has in mind but here is my
understanding of the situation and how we could move forward.

Currently:
--

   desc(X) → X is customly matched as a case insensitive litteral,

   We have a "generic" pattern definition syntax used by various other
reveset (implemented in "stringmatcher")

 foo(X)
   → X is matched as a case sensitive litteral
 foo('literal:X')
   → X is matched as a case sensitive literal (same as the above)
 food('re:X')
   → X is matched as a regular expression (case sensitive)

Proposal: (might be what yuya says)
-

extend the string matcher to

   foo('literal:X')
 → X is matched as a case sensitive literal


See the comment in the new patch I sent about 'user()' already
lowercasing 'literal:' and 're:'.  I'd consider it a bug, but it's been
in since mid 2012.  Attempting to channel Matt, I'm guessing we are
stuck with that since it is so old, but wanted to see what others think.


1) Yep, we are stuck with whatever existing behavior we have for  
existing predicate because of BC. (but we can augment it)


2) Congratulation you seems to have unearthed an area where we have many  
predicated with close but slightly different behavior. At that point  
I'll ask you an inventory of what we currently have so that we can  
devise a sound and as consistent as possible way forward.


   Can you provide us with a table that at least keep track of:

* predicate
* default behavior
* support 'rich' stringmatcher ?
* are 'literal:' case sensitive ?
* are 're:' case sensitive (and supported at all) ?


TL;DR: desc, grep, keyword, and author/user are the oddballs.  grep and  
keyword are the regex and literal halves respectively, of the same search.


After stripping out non string predicates, we basically end up with 4  
groups:


- util.stringmatcher based:

Predicate:Case Sensitive?
"author"   N
"user" N
"bookmark" Y
"branch"   Y
"extra"Y
"named"Y
"subrepo"  [1] Y
"tag"  Y

These all support 'literal:' and 're:'.  Case sensitivity applies the same  
to both prefixes, and raw pattern.


[1] Not documented to support 'literal:' or 're:' prefixes.


- Local method implementation based:

Predicate:Case Sensitive?
"desc" N
"grep" Y
"keyword"  N

None of these support prefixes.  The grep param is a regex without 're:',  
so it doesn't make a lot of sense to support stringmatcher here- what  
stringmatcher thinks is literal is really regex.  If we internally bolt on  
're:', it still can't support literal matches.



- match.py based (not relevant, but for completeness):

"adds", "contains", "file", "filelog", "follow",
"followlines", "modifies", "removes"

- "bisect" (I can't see how 're:' support here would be meaningful.)


 From there we'll be able to see if a pattern emerge and pick the best  
way to move forward.


The following thoughts 

Re: [PATCH] revset: introduce the summary predicate

2017-01-09 Thread Augie Fackler
On Sun, Jan 08, 2017 at 03:34:13PM -0500, Matt Harbison wrote:
> On Sun, 08 Jan 2017 07:59:36 -0500, Pierre-Yves David
>  wrote:
>
> > (ha, I wrote my previous reply in a train and it got sent when I
> > connected again (and received that one). I'm going to try to adress the
> > new content in this email and sometime repeat some of my other reply
> > content for clarity)
> >
> > On 01/08/2017 04:23 AM, Matt Harbison wrote:

[...]

> > I'm not 100% sure of what Yuya actually has in mind but here is my
> > understanding of the situation and how we could move forward.
> >
> > Currently:
> > --
> >
> >desc(X) → X is customly matched as a case insensitive litteral,
> >
> >We have a "generic" pattern definition syntax used by various other
> > reveset (implemented in "stringmatcher")
> >
> >  foo(X)
> >→ X is matched as a case sensitive litteral
> >  foo('literal:X')
> >→ X is matched as a case sensitive literal (same as the above)
> >  food('re:X')
> >→ X is matched as a regular expression (case sensitive)
> >
> > Proposal: (might be what yuya says)
> > -
> >
> > extend the string matcher to
> >
> >foo('literal:X')
> >  → X is matched as a case sensitive literal
>
> See the comment in the new patch I sent about 'user()' already lowercasing
> 'literal:' and 're:'.  I'd consider it a bug, but it's been in since mid
> 2012.  Attempting to channel Matt, I'm guessing we are stuck with that since
> it is so old, but wanted to see what others think.

Guessing at motivations for user() lowercasing: most email hosts are
case-insensitive for the user part, even though rfc(2)822 doesn't
require it.

(Mostly stating this so that there's some trail of pondering if this
ever comes up in the future.)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] revset: introduce the summary predicate

2017-01-09 Thread Pierre-Yves David



On 01/08/2017 09:34 PM, Matt Harbison wrote:

On Sun, 08 Jan 2017 07:59:36 -0500, Pierre-Yves David
 wrote:


(ha, I wrote my previous reply in a train and it got sent when I
connected again (and received that one). I'm going to try to adress
the new content in this email and sometime repeat some of my other
reply content for clarity)

On 01/08/2017 04:23 AM, Matt Harbison wrote:

On Sat, 07 Jan 2017 02:56:48 -0500, Yuya Nishihara 
wrote:


On Fri, 6 Jan 2017 21:29:43 -0500, Matt Harbison wrote:

> On Jan 6, 2017, at 11:19 AM, Pierre-Yves David
 wrote:
>> On 01/04/2017 07:04 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison 
>> # Date 1483550016 18000
>> #  Wed Jan 04 12:13:36 2017 -0500
>> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
>> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
>> revset: introduce the summary predicate
>>



Perhaps stringmatcher can have 3 types, icase literal, literal, and
re, and
the default of desc() is icase literal for backward compatibility. You
can
build a case-insensitive regexp object from a literal pattern.

https://docs.python.org/2/library/re.html#re.I


Yep, that's the API I was thinking of.

I'm confused by the rest of your comments.  When I first skimmed your
message, adding support for 'icasere:' using this API popped into my
mind.  And that could support a case insensitive literal, because
'icasere:foo' should be equivalent to looking for the substring 'foo'
(leaving aside efficiency, how discoverable that is, and that
stringmatcher matches the whole string for literals).  But you seem to
be suggesting adding 'icaseliteral:'.


I'm not 100% sure of what Yuya actually has in mind but here is my
understanding of the situation and how we could move forward.

Currently:
--

   desc(X) → X is customly matched as a case insensitive litteral,

   We have a "generic" pattern definition syntax used by various other
reveset (implemented in "stringmatcher")

 foo(X)
   → X is matched as a case sensitive litteral
 foo('literal:X')
   → X is matched as a case sensitive literal (same as the above)
 food('re:X')
   → X is matched as a regular expression (case sensitive)

Proposal: (might be what yuya says)
-

extend the string matcher to

   foo('literal:X')
 → X is matched as a case sensitive literal


See the comment in the new patch I sent about 'user()' already
lowercasing 'literal:' and 're:'.  I'd consider it a bug, but it's been
in since mid 2012.  Attempting to channel Matt, I'm guessing we are
stuck with that since it is so old, but wanted to see what others think.


1) Yep, we are stuck with whatever existing behavior we have for 
existing predicate because of BC. (but we can augment it)


2) Congratulation you seems to have unearthed an area where we have many 
predicated with close but slightly different behavior. At that point 
I'll ask you an inventory of what we currently have so that we can 
devise a sound and as consistent as possible way forward.


  Can you provide us with a table that at least keep track of:

* predicate
* default behavior
* support 'rich' stringmatcher ?
* are 'literal:' case sensitive ?
* are 're:' case sensitive (and supported at all) ?

From there we'll be able to see if a pattern emerge and pick the best 
way to move forward.



   foo('icase-literal:X')
 → X is matched as a case insensitive literal
   food('re:X')
 → X is matched as a regular expression (case sensitive)

Then, desc move to use string matcher (default to "icase-literal").

We do not need a 'icase-re:' spec, because one can easily achieve it
using 're:(?i)foo'


Ah! I missed the part in the docs where flags could be set in the string
with (?). I thought you needed to compile with re.FLAG.  When he
said string literal, my mind went right to the 'literal:' prefix.
Agreed, no need for 'icase-re:'.


Someone getting slightly confused with regular expression? Impossible! ;-)


[…]
I'm about to submit a patch to add the current 're:' support to 'desc'
in the meantime, to hopefully move this along.


Great!


 I'd also be curious if
you have thoughts on how to conditionally limit this predicate to the
first line, without limiting future functionality.


So having digged the regexp part a bit more, it seems like one could
just use 're:^.*issue1337' to match "issue1337" on the first line ('.'
does not match new-line by default).


Thanks for looking at that.  It's way less horrible than I thought it
would be.  I'm curious what Sean thinks, since he mentioned {firstline}
being put in as a substitute for a complex regex.  I'd be fine with
skipping the firstline=True param if this case is mentioned in the help
for desc().


I've missed that '{firstline}' proposal from Sean, can you point me at 
it? (or summarize it ?)


Thanks a lot for looking into this!

Cheers,

--
Pierre-Yves David

Re: [PATCH] revset: introduce the summary predicate

2017-01-08 Thread Matt Harbison
On Sun, 08 Jan 2017 07:59:36 -0500, Pierre-Yves David  
 wrote:


(ha, I wrote my previous reply in a train and it got sent when I  
connected again (and received that one). I'm going to try to adress the  
new content in this email and sometime repeat some of my other reply  
content for clarity)


On 01/08/2017 04:23 AM, Matt Harbison wrote:
On Sat, 07 Jan 2017 02:56:48 -0500, Yuya Nishihara   
wrote:



On Fri, 6 Jan 2017 21:29:43 -0500, Matt Harbison wrote:

> On Jan 6, 2017, at 11:19 AM, Pierre-Yves David
 wrote:
>> On 01/04/2017 07:04 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison 
>> # Date 1483550016 18000
>> #  Wed Jan 04 12:13:36 2017 -0500
>> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
>> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
>> revset: introduce the summary predicate
>>



Perhaps stringmatcher can have 3 types, icase literal, literal, and
re, and
the default of desc() is icase literal for backward compatibility. You
can
build a case-insensitive regexp object from a literal pattern.

https://docs.python.org/2/library/re.html#re.I


Yep, that's the API I was thinking of.

I'm confused by the rest of your comments.  When I first skimmed your
message, adding support for 'icasere:' using this API popped into my
mind.  And that could support a case insensitive literal, because
'icasere:foo' should be equivalent to looking for the substring 'foo'
(leaving aside efficiency, how discoverable that is, and that
stringmatcher matches the whole string for literals).  But you seem to
be suggesting adding 'icaseliteral:'.


I'm not 100% sure of what Yuya actually has in mind but here is my  
understanding of the situation and how we could move forward.


Currently:
--

   desc(X) → X is customly matched as a case insensitive litteral,

   We have a "generic" pattern definition syntax used by various other  
reveset (implemented in "stringmatcher")


 foo(X)
   → X is matched as a case sensitive litteral
 foo('literal:X')
   → X is matched as a case sensitive literal (same as the above)
 food('re:X')
   → X is matched as a regular expression (case sensitive)

Proposal: (might be what yuya says)
-

extend the string matcher to

   foo('literal:X')
 → X is matched as a case sensitive literal


See the comment in the new patch I sent about 'user()' already lowercasing  
'literal:' and 're:'.  I'd consider it a bug, but it's been in since mid  
2012.  Attempting to channel Matt, I'm guessing we are stuck with that  
since it is so old, but wanted to see what others think.



   foo('icase-literal:X')
 → X is matched as a case insensitive literal
   food('re:X')
 → X is matched as a regular expression (case sensitive)

Then, desc move to use string matcher (default to "icase-literal").

We do not need a 'icase-re:' spec, because one can easily achieve it  
using 're:(?i)foo'


Ah! I missed the part in the docs where flags could be set in the string  
with (?). I thought you needed to compile with re.FLAG.  When he  
said string literal, my mind went right to the 'literal:' prefix.  Agreed,  
no need for 'icase-re:'.



[…]
I'm about to submit a patch to add the current 're:' support to 'desc'
in the meantime, to hopefully move this along.


Great!


 I'd also be curious if
you have thoughts on how to conditionally limit this predicate to the
first line, without limiting future functionality.


So having digged the regexp part a bit more, it seems like one could  
just use 're:^.*issue1337' to match "issue1337" on the first line ('.'  
does not match new-line by default).


Thanks for looking at that.  It's way less horrible than I thought it  
would be.  I'm curious what Sean thinks, since he mentioned {firstline}  
being put in as a substitute for a complex regex.  I'd be fine with  
skipping the firstline=True param if this case is mentioned in the help  
for desc().


(I've also got a documentation update that eliminates all of these copies  
of how 're:' and 'literal:' works, once the dust settles on this.)



If you think this is too obscure, what about:

 'desc(issue1337, firstline=True)'

See my other email for details.

Cheers,

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] revset: introduce the summary predicate

2017-01-08 Thread Yuya Nishihara
On Sun, 8 Jan 2017 23:46:00 +0900, Yuya Nishihara wrote:
> On Sun, 8 Jan 2017 13:59:36 +0100, Pierre-Yves David wrote:
> > I'm not 100% sure of what Yuya actually has in mind but here is my 
> > understanding of the situation and how we could move forward.
> > 
> > Currently:
> > --
> > 
> >desc(X) → X is customly matched as a case insensitive litteral,
> > 
> >We have a "generic" pattern definition syntax used by various other 
> > reveset (implemented in "stringmatcher")
> > 
> >  foo(X)
> >→ X is matched as a case sensitive litteral
> >  foo('literal:X')
> >→ X is matched as a case sensitive literal (same as the above)
> >  food('re:X')
> >→ X is matched as a regular expression (case sensitive)
> > 
> > Proposal: (might be what yuya says)
> > -
> > 
> > extend the string matcher to
> > 
> >foo('literal:X')
> >  → X is matched as a case sensitive literal
> >foo('icase-literal:X')
> >  → X is matched as a case insensitive literal
> >food('re:X')
> >  → X is matched as a regular expression (case sensitive)
> > 
> > Then, desc move to use string matcher (default to "icase-literal").
> > 
> > We do not need a 'icase-re:' spec, because one can easily achieve it 
> > using 're:(?i)foo'
> 
> Exactly! IMO, the explicit "literal:" should be case-sensitive so my proposal.
> And case-insensitive literal matcher can be build as a regexp object.
> 
> On Sat, 07 Jan 2017 22:23:33 -0500, Matt Harbison wrote:
> > I don't do any work with locales, so this is probably a silly question.   
> > Does the fact that the API ignores the current locale mean that there are  
> > corner cases where it won't agree with the current method of  
> > encoding.lowercase() everything and comparing?
> 
> Good point. unicode.lower() is locale aware so encoding.lower() would be the
> same, which is slightly different from re.I. But I think that is 
> unintentional.
> e38846a79a23 only mentions encoding issue, which would be a sort of Shift_JIS
> problems, not a locale-dependent case sensitivity.

Never mind, util.stringmatcher() just returns a matcher function so we can
continue using encoding.lower().
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] revset: introduce the summary predicate

2017-01-08 Thread Yuya Nishihara
On Sun, 8 Jan 2017 13:59:36 +0100, Pierre-Yves David wrote:
> I'm not 100% sure of what Yuya actually has in mind but here is my 
> understanding of the situation and how we could move forward.
> 
> Currently:
> --
> 
>desc(X) → X is customly matched as a case insensitive litteral,
> 
>We have a "generic" pattern definition syntax used by various other 
> reveset (implemented in "stringmatcher")
> 
>  foo(X)
>→ X is matched as a case sensitive litteral
>  foo('literal:X')
>→ X is matched as a case sensitive literal (same as the above)
>  food('re:X')
>→ X is matched as a regular expression (case sensitive)
> 
> Proposal: (might be what yuya says)
> -
> 
> extend the string matcher to
> 
>foo('literal:X')
>  → X is matched as a case sensitive literal
>foo('icase-literal:X')
>  → X is matched as a case insensitive literal
>food('re:X')
>  → X is matched as a regular expression (case sensitive)
> 
> Then, desc move to use string matcher (default to "icase-literal").
> 
> We do not need a 'icase-re:' spec, because one can easily achieve it 
> using 're:(?i)foo'

Exactly! IMO, the explicit "literal:" should be case-sensitive so my proposal.
And case-insensitive literal matcher can be build as a regexp object.

On Sat, 07 Jan 2017 22:23:33 -0500, Matt Harbison wrote:
> I don't do any work with locales, so this is probably a silly question.   
> Does the fact that the API ignores the current locale mean that there are  
> corner cases where it won't agree with the current method of  
> encoding.lowercase() everything and comparing?

Good point. unicode.lower() is locale aware so encoding.lower() would be the
same, which is slightly different from re.I. But I think that is unintentional.
e38846a79a23 only mentions encoding issue, which would be a sort of Shift_JIS
problems, not a locale-dependent case sensitivity.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] revset: introduce the summary predicate

2017-01-08 Thread Pierre-Yves David
(ha, I wrote my previous reply in a train and it got sent when I 
connected again (and received that one). I'm going to try to adress the 
new content in this email and sometime repeat some of my other reply 
content for clarity)


On 01/08/2017 04:23 AM, Matt Harbison wrote:

On Sat, 07 Jan 2017 02:56:48 -0500, Yuya Nishihara  wrote:


On Fri, 6 Jan 2017 21:29:43 -0500, Matt Harbison wrote:

> On Jan 6, 2017, at 11:19 AM, Pierre-Yves David
 wrote:
>> On 01/04/2017 07:04 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison 
>> # Date 1483550016 18000
>> #  Wed Jan 04 12:13:36 2017 -0500
>> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
>> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
>> revset: introduce the summary predicate
>>
>> This is similar to the 'desc()' predicate, however the search is
limited to the
>> first line, which I couldn't figure out how to do with the existing
>> functionality.  Unlike 'desc()', this supports regular
expressions.  The
>> 'matching()' revset already treats the summary line distinctly
from the full
>> description, but that finds similar revisions instead of searching
with a
>> string.
>
> Looks like a great usecase to handle, I want that. However, are
there reasons why:
>
> 1) We could not add full pattern support to desc()
> 2) We could not make this an extra keyworkd parameters of desc()
>
> Multiplying the revset predicate hurts discoverability, having less
but more powerful predicate seems useful.

I share your concern about discoverability.

I started out editing desc(), but it's explicitly documented as case
insensitive.  It seems confusing if matching for literals is case
insensitive, but regex is case sensitive for the same predicate. (I
vaguely recall that you can make regex case insensitive, but I think
that would also surprise the user.)  I meant to mention that maybe we
could add pattern support to desc() in addition to this, but forgot.
The other stuff I looked at that supports patterns, like tag(), is
case sensitive for both literals and regex.  That makes sense, since
those search for identifiers.


Perhaps stringmatcher can have 3 types, icase literal, literal, and
re, and
the default of desc() is icase literal for backward compatibility. You
can
build a case-insensitive regexp object from a literal pattern.

https://docs.python.org/2/library/re.html#re.I


Yep, that's the API I was thinking of.

I'm confused by the rest of your comments.  When I first skimmed your
message, adding support for 'icasere:' using this API popped into my
mind.  And that could support a case insensitive literal, because
'icasere:foo' should be equivalent to looking for the substring 'foo'
(leaving aside efficiency, how discoverable that is, and that
stringmatcher matches the whole string for literals).  But you seem to
be suggesting adding 'icaseliteral:'.


I'm not 100% sure of what Yuya actually has in mind but here is my 
understanding of the situation and how we could move forward.


Currently:
--

  desc(X) → X is customly matched as a case insensitive litteral,

  We have a "generic" pattern definition syntax used by various other 
reveset (implemented in "stringmatcher")


foo(X)
  → X is matched as a case sensitive litteral
foo('literal:X')
  → X is matched as a case sensitive literal (same as the above)
food('re:X')
  → X is matched as a regular expression (case sensitive)

Proposal: (might be what yuya says)
-

extend the string matcher to

  foo('literal:X')
→ X is matched as a case sensitive literal
  foo('icase-literal:X')
→ X is matched as a case insensitive literal
  food('re:X')
→ X is matched as a regular expression (case sensitive)

Then, desc move to use string matcher (default to "icase-literal").

We do not need a 'icase-re:' spec, because one can easily achieve it 
using 're:(?i)foo'



[…]
I'm about to submit a patch to add the current 're:' support to 'desc'
in the meantime, to hopefully move this along.


Great!


 I'd also be curious if
you have thoughts on how to conditionally limit this predicate to the
first line, without limiting future functionality.


So having digged the regexp part a bit more, it seems like one could 
just use 're:^.*issue1337' to match "issue1337" on the first line ('.' 
does not match new-line by default).


If you think this is too obscure, what about:

'desc(issue1337, firstline=True)'

See my other email for details.

Cheers,

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] revset: introduce the summary predicate

2017-01-08 Thread Pierre-Yves David



On 01/07/2017 08:56 AM, Yuya Nishihara wrote:

On Fri, 6 Jan 2017 21:29:43 -0500, Matt Harbison wrote:

On Jan 6, 2017, at 11:19 AM, Pierre-Yves David  
wrote:

On 01/04/2017 07:04 PM, Matt Harbison wrote:
# HG changeset patch
# User Matt Harbison 
# Date 1483550016 18000
#  Wed Jan 04 12:13:36 2017 -0500
# Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
# Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
revset: introduce the summary predicate

This is similar to the 'desc()' predicate, however the search is limited to the
first line, which I couldn't figure out how to do with the existing
functionality.  Unlike 'desc()', this supports regular expressions.  The
'matching()' revset already treats the summary line distinctly from the full
description, but that finds similar revisions instead of searching with a
string.


Looks like a great usecase to handle, I want that. However, are there reasons 
why:

1) We could not add full pattern support to desc()
2) We could not make this an extra keyworkd parameters of desc()

Multiplying the revset predicate hurts discoverability, having less but more 
powerful predicate seems useful.


I share your concern about discoverability.

I started out editing desc(), but it's explicitly documented as case 
insensitive.  It seems confusing if matching for literals is case insensitive, 
but regex is case sensitive for the same predicate. (I vaguely recall that you 
can make regex case insensitive, but I think that would also surprise the 
user.)  I meant to mention that maybe we could add pattern support to desc() in 
addition to this, but forgot.  The other stuff I looked at that supports 
patterns, like tag(), is case sensitive for both literals and regex.  That 
makes sense, since those search for identifiers.


Perhaps stringmatcher can have 3 types, icase literal, literal, and re, and
the default of desc() is icase literal for backward compatibility. You can
build a case-insensitive regexp object from a literal pattern.

https://docs.python.org/2/library/re.html#re.I


Yep, I think it is fine to have a desc() upgraded to more powerful 
pattern. We must keep the default to "icase literal" but I don't see the 
need to make the regex case insensitive with "desc". The user can 
specify the case insensitivity flag to the regex if they needs it.


We just have to make sure to clearly document the case-insensitivity of 
the default as it is different from most other revset with rich-pattern 
matching.



I didn't consider #2, because of the issues with #1.  Can you suggest a format 
for this? I don't see any other ways to slice and dice the description, so it 
kinda feels like the recently discussed boolean arg, i.e. the equivalent of:

list get_matches(char *str, bool summary_only);

I'm not strongly opposed to this;


If we get regexp support, we can probably easily build a regexp that 
restrict search to the first line (but I've not checked what mode we use 
by default so I' might be wrong). If regex fill this need we might 
consider thats enough and we don't need anything else UI wise.


What do you think?

If you think something more explicit/user friendly is needed, I think a 
boolean keywork-only argument restricting to firstline would do:


'desc(issue1337, firstline=True)'

(But if regexp seems good enough to you, I would rather stick to a 
simpler UI, new predicate < new parameter < no parameter ;-) )



I care more about the functionality than the incantation.  That said, there's 
been some serious bikeshedding at work about whether bug markers belong at the 
beginning or end of the summary.  The argument for at the beginning seems to be 
that it's easier to find by visually scanning each summary.  I'd like to push 
them towards the query tools, over eyeing things up.  Simpler and more concise 
will make it easier for the uninitiated.  Of course, I could probably create a 
shorter alias if needed.


(That part is more about your personal work-flow at work. We have been 
using "issue tag in summary" for Mercurial and I think it works well in 
most case and help when you scan the log. In practice that really 
depends how you use that tags and what it means in your context.



The name make perfect sense when you think about it at first contact "summary" made me 
though about "hg summary" and I was a bit confused about what this could do. I wonder if 
we can have something better (but questions about desc seems more important first)


I can see that, but I don't have any better ideas.  I mentioned the "summary" 
field in the matching() predicate to point to prior art and consistency in the revset 
arena.


Ha right, if "summary" is already used that way in a revset context the 
name seems fine. Sorry for missing that.


Cheers

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org

Re: [PATCH] revset: introduce the summary predicate

2017-01-07 Thread Matt Harbison

On Sat, 07 Jan 2017 02:56:48 -0500, Yuya Nishihara  wrote:


On Fri, 6 Jan 2017 21:29:43 -0500, Matt Harbison wrote:
> On Jan 6, 2017, at 11:19 AM, Pierre-Yves David  
 wrote:

>> On 01/04/2017 07:04 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison 
>> # Date 1483550016 18000
>> #  Wed Jan 04 12:13:36 2017 -0500
>> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
>> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
>> revset: introduce the summary predicate
>>
>> This is similar to the 'desc()' predicate, however the search is  
limited to the

>> first line, which I couldn't figure out how to do with the existing
>> functionality.  Unlike 'desc()', this supports regular expressions.   
The
>> 'matching()' revset already treats the summary line distinctly from  
the full
>> description, but that finds similar revisions instead of searching  
with a

>> string.
>
> Looks like a great usecase to handle, I want that. However, are there  
reasons why:

>
> 1) We could not add full pattern support to desc()
> 2) We could not make this an extra keyworkd parameters of desc()
>
> Multiplying the revset predicate hurts discoverability, having less  
but more powerful predicate seems useful.


I share your concern about discoverability.

I started out editing desc(), but it's explicitly documented as case  
insensitive.  It seems confusing if matching for literals is case  
insensitive, but regex is case sensitive for the same predicate. (I  
vaguely recall that you can make regex case insensitive, but I think  
that would also surprise the user.)  I meant to mention that maybe we  
could add pattern support to desc() in addition to this, but forgot.   
The other stuff I looked at that supports patterns, like tag(), is case  
sensitive for both literals and regex.  That makes sense, since those  
search for identifiers.


Perhaps stringmatcher can have 3 types, icase literal, literal, and re,  
and
the default of desc() is icase literal for backward compatibility. You  
can

build a case-insensitive regexp object from a literal pattern.

https://docs.python.org/2/library/re.html#re.I


Yep, that's the API I was thinking of.

I'm confused by the rest of your comments.  When I first skimmed your  
message, adding support for 'icasere:' using this API popped into my  
mind.  And that could support a case insensitive literal, because  
'icasere:foo' should be equivalent to looking for the substring 'foo'  
(leaving aside efficiency, how discoverable that is, and that  
stringmatcher matches the whole string for literals).  But you seem to be  
suggesting adding 'icaseliteral:'.


I don't do any work with locales, so this is probably a silly question.   
Does the fact that the API ignores the current locale mean that there are  
corner cases where it won't agree with the current method of  
encoding.lowercase() everything and comparing?


I'm about to submit a patch to add the current 're:' support to 'desc' in  
the meantime, to hopefully move this along.  I'd also be curious if you  
have thoughts on how to conditionally limit this predicate to the first  
line, without limiting future functionality.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] revset: introduce the summary predicate

2017-01-07 Thread Gregory Szorc
On Wed, Jan 4, 2017 at 10:04 AM, Matt Harbison 
wrote:

> # HG changeset patch
> # User Matt Harbison 
> # Date 1483550016 18000
> #  Wed Jan 04 12:13:36 2017 -0500
> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
> revset: introduce the summary predicate
>
> This is similar to the 'desc()' predicate, however the search is limited
> to the
> first line, which I couldn't figure out how to do with the existing
> functionality.  Unlike 'desc()', this supports regular expressions.  The
> 'matching()' revset already treats the summary line distinctly from the
> full
> description, but that finds similar revisions instead of searching with a
> string.
>
> I'm torn on whether it should support case insensitivity, like 'desc()' and
> 'keyword()'.  It can be handy, but it seems like it would be surprising if
> literals and regex are handled differently, or if the regex was adjusted
> to be
> case insensitive internally.
>
> The use case is to be able to select bug fixes out of the summary line,
> while
> ignoring similar markers within the body of the commit.  I've seen
> previously
> where the bug bot will erroneously update a bug that is mentioned as a
> parenthetical within the detail of the commit message.  To illustrate on
> the
> local Mercurial repo (some old markers have an internal space):
>
>   $ hg log -r "desc(r' (issue')" -T "{rev}\n" | wc -l
>   1672
>   $ hg log -r "desc(r' (issue')" -T "{desc|firstline}\n"  \
>   >| egrep '\(issue' | wc -l
>   1633
>   $ hg log -r "desc(r' (issue')" -T "{desc|firstline}\n"  \
>   >| egrep '\(issue[0-9]+\)' | wc -l
>   1560
>   $ hg log -r "summary(r're: \(issue\d+\)')" -T "{rev}\n" | wc -l
>   1560
>
> I'm not sure why the splitlines() logic is that complicated, but I
> copy/pasted
> from templatefilters.firstline() for consistency.
>

Yeah, that is unfortunate. I'm not sure if it matters, but calling
splitlines() may lead to performance issues having to create tons of
throwaway strings as part of the returned list. Normally I'd say to not
worry about it, but revsets run very quickly and it could create enough GC
overhead to cause problems on large repos.

A `desc[:desc.index('\n')]` (inside a try..except ValueError) feels like it
should be faster (and it should just work unless we care about bare CR,
which was used on some older platforms). Even a `desc.split('\n', 1)` feels
better than full splitlines(). The patch shouldn't be blocked on it, but it
would be nice to have measurements for peace of mind.


>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -2197,6 +2197,32 @@ def subrepo(repo, subset, x):
>
>  return subset.filter(matches, condrepr=('', pat))
>
> +@predicate('summary(string)', safe=True)
> +def summary(repo, subset, x):
> +"""Changesets with the given string in the summary line.  The match
> is case
> +sensitive.
> +
> +If `value` starts with `re:`, the remainder of the value is treated as
> +a regular expression. To match a value that actually starts with
> `re:`,
> +use the prefix `literal:`.
> +"""
> +
> +# i18n: "summary" is a keyword
> +s = getstring(x, _("summary requires a string"))
> +kind, value, matcher = _substringmatcher(s)
> +
> +def _matchvalue(r):
> +_desc = repo[r].description()
> +_sum = ''
> +try:
> +_sum = _desc.splitlines(True)[0].rstrip('\r\n')
> +except IndexError:
> +pass
> +
> +return matcher(_sum)
> +
> +return subset.filter(lambda r: _matchvalue(r), condrepr=(' %r>', s))
> +
>  def _substringmatcher(pattern):
>  kind, pattern, matcher = util.stringmatcher(pattern)
>  if kind == 'literal':
> diff --git a/tests/test-commit.t b/tests/test-commit.t
> --- a/tests/test-commit.t
> +++ b/tests/test-commit.t
> @@ -252,13 +252,50 @@ full log
>commit-foo-subdir
>
>
> +  $ cat > ../commit-log-test < +  > issue fix (bug1234)
> +  >
> +  > multiline description here
> +  > EOF
> +  $ echo "edited" >> foo/plain-file
> +  $ hg ci -l ../commit-log-test
> +
> +Summary doesn't search past the first line
> +  $ hg log -r "summary('description')"
> +
> +Summary works with regex
> +  $ hg log -r "summary(r're:\(bug\d+\)')"
> +  changeset:   2:5364ddbe8265
> +  tag: tip
> +  user:test
> +  date:Thu Jan 01 00:00:00 1970 +
> +  summary: issue fix (bug1234)
> +
> +
> +Summary works with partial matches
> +  $ hg log -r "summary('foo')"
> +  changeset:   0:65d4e9386227
> +  user:test
> +  date:Thu Jan 01 00:00:00 1970 +
> +  summary: commit-foo-subdir
> +
> +  changeset:   1:95b38e3a5b2e
> +  user:test
> +  date:Thu Jan 01 00:00:00 1970 +
> +  summary: commit-foo-dot
> +
>
>  subdir log
>
>$ cd foo
>$ hg log .
> +  changeset:   2:5364ddbe8265
> 

Re: [PATCH] revset: introduce the summary predicate

2017-01-06 Thread Yuya Nishihara
On Fri, 6 Jan 2017 21:29:43 -0500, Matt Harbison wrote:
> > On Jan 6, 2017, at 11:19 AM, Pierre-Yves David 
> >  wrote:
> >> On 01/04/2017 07:04 PM, Matt Harbison wrote:
> >> # HG changeset patch
> >> # User Matt Harbison 
> >> # Date 1483550016 18000
> >> #  Wed Jan 04 12:13:36 2017 -0500
> >> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
> >> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
> >> revset: introduce the summary predicate
> >> 
> >> This is similar to the 'desc()' predicate, however the search is limited 
> >> to the
> >> first line, which I couldn't figure out how to do with the existing
> >> functionality.  Unlike 'desc()', this supports regular expressions.  The
> >> 'matching()' revset already treats the summary line distinctly from the 
> >> full
> >> description, but that finds similar revisions instead of searching with a
> >> string.
> > 
> > Looks like a great usecase to handle, I want that. However, are there 
> > reasons why:
> > 
> > 1) We could not add full pattern support to desc()
> > 2) We could not make this an extra keyworkd parameters of desc()
> > 
> > Multiplying the revset predicate hurts discoverability, having less but 
> > more powerful predicate seems useful.
> 
> I share your concern about discoverability.
> 
> I started out editing desc(), but it's explicitly documented as case 
> insensitive.  It seems confusing if matching for literals is case 
> insensitive, but regex is case sensitive for the same predicate. (I vaguely 
> recall that you can make regex case insensitive, but I think that would also 
> surprise the user.)  I meant to mention that maybe we could add pattern 
> support to desc() in addition to this, but forgot.  The other stuff I looked 
> at that supports patterns, like tag(), is case sensitive for both literals 
> and regex.  That makes sense, since those search for identifiers.

Perhaps stringmatcher can have 3 types, icase literal, literal, and re, and
the default of desc() is icase literal for backward compatibility. You can
build a case-insensitive regexp object from a literal pattern.

https://docs.python.org/2/library/re.html#re.I
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] revset: introduce the summary predicate

2017-01-06 Thread Matt Harbison

> On Jan 6, 2017, at 11:19 AM, Pierre-Yves David 
>  wrote:
> 
> 
> 
>> On 01/04/2017 07:04 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison 
>> # Date 1483550016 18000
>> #  Wed Jan 04 12:13:36 2017 -0500
>> # Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
>> # Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
>> revset: introduce the summary predicate
>> 
>> This is similar to the 'desc()' predicate, however the search is limited to 
>> the
>> first line, which I couldn't figure out how to do with the existing
>> functionality.  Unlike 'desc()', this supports regular expressions.  The
>> 'matching()' revset already treats the summary line distinctly from the full
>> description, but that finds similar revisions instead of searching with a
>> string.
> 
> Looks like a great usecase to handle, I want that. However, are there reasons 
> why:
> 
> 1) We could not add full pattern support to desc()
> 2) We could not make this an extra keyworkd parameters of desc()
> 
> Multiplying the revset predicate hurts discoverability, having less but more 
> powerful predicate seems useful.

I share your concern about discoverability.

I started out editing desc(), but it's explicitly documented as case 
insensitive.  It seems confusing if matching for literals is case insensitive, 
but regex is case sensitive for the same predicate. (I vaguely recall that you 
can make regex case insensitive, but I think that would also surprise the 
user.)  I meant to mention that maybe we could add pattern support to desc() in 
addition to this, but forgot.  The other stuff I looked at that supports 
patterns, like tag(), is case sensitive for both literals and regex.  That 
makes sense, since those search for identifiers.

I didn't consider #2, because of the issues with #1.  Can you suggest a format 
for this? I don't see any other ways to slice and dice the description, so it 
kinda feels like the recently discussed boolean arg, i.e. the equivalent of:

list get_matches(char *str, bool summary_only);

I'm not strongly opposed to this; I care more about the functionality than the 
incantation.  That said, there's been some serious bikeshedding at work about 
whether bug markers belong at the beginning or end of the summary.  The 
argument for at the beginning seems to be that it's easier to find by visually 
scanning each summary.  I'd like to push them towards the query tools, over 
eyeing things up.  Simpler and more concise will make it easier for the 
uninitiated.  Of course, I could probably create a shorter alias if needed.

> The name make perfect sense when you think about it at first contact 
> "summary" made me though about "hg summary" and I was a bit confused about 
> what this could do. I wonder if we can have something better (but questions 
> about desc seems more important first)

I can see that, but I don't have any better ideas.  I mentioned the "summary" 
field in the matching() predicate to point to prior art and consistency in the 
revset arena.

> 
>> I'm torn on whether it should support case insensitivity, like 'desc()' and
>> 'keyword()'.  It can be handy, but it seems like it would be surprising if
>> literals and regex are handled differently, or if the regex was adjusted to 
>> be
>> case insensitive internally.
>> 
>> The use case is to be able to select bug fixes out of the summary line, while
>> ignoring similar markers within the body of the commit.  I've seen previously
>> where the bug bot will erroneously update a bug that is mentioned as a
>> parenthetical within the detail of the commit message.  To illustrate on the
>> local Mercurial repo (some old markers have an internal space):
>> 
>>  $ hg log -r "desc(r' (issue')" -T "{rev}\n" | wc -l
>>  1672
>>  $ hg log -r "desc(r' (issue')" -T "{desc|firstline}\n"  \
>>  >| egrep '\(issue' | wc -l
>>  1633
>>  $ hg log -r "desc(r' (issue')" -T "{desc|firstline}\n"  \
>>  >| egrep '\(issue[0-9]+\)' | wc -l
>>  1560
>>  $ hg log -r "summary(r're: \(issue\d+\)')" -T "{rev}\n" | wc -l
>>  1560
>> 
>> I'm not sure why the splitlines() logic is that complicated, but I 
>> copy/pasted
>> from templatefilters.firstline() for consistency.
>> 
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -2197,6 +2197,32 @@ def subrepo(repo, subset, x):
>> 
>> return subset.filter(matches, condrepr=('', pat))
>> 
>> +@predicate('summary(string)', safe=True)
>> +def summary(repo, subset, x):
>> +"""Changesets with the given string in the summary line.  The match is 
>> case
>> +sensitive.
>> +
>> +If `value` starts with `re:`, the remainder of the value is treated as
>> +a regular expression. To match a value that actually starts with `re:`,
>> +use the prefix `literal:`.
>> +"""
>> +
>> +# i18n: "summary" is a keyword
>> +s = getstring(x, _("summary 

Re: [PATCH] revset: introduce the summary predicate

2017-01-06 Thread Pierre-Yves David



On 01/04/2017 07:04 PM, Matt Harbison wrote:

# HG changeset patch
# User Matt Harbison 
# Date 1483550016 18000
#  Wed Jan 04 12:13:36 2017 -0500
# Node ID 76d95ab94b9e206363629059fb7824002e19a9e5
# Parent  0064a1eb28e246ded9b726c696d048143d1b23f1
revset: introduce the summary predicate

This is similar to the 'desc()' predicate, however the search is limited to the
first line, which I couldn't figure out how to do with the existing
functionality.  Unlike 'desc()', this supports regular expressions.  The
'matching()' revset already treats the summary line distinctly from the full
description, but that finds similar revisions instead of searching with a
string.


Looks like a great usecase to handle, I want that. However, are there 
reasons why:


1) We could not add full pattern support to desc()
2) We could not make this an extra keyworkd parameters of desc()

Multiplying the revset predicate hurts discoverability, having less but 
more powerful predicate seems useful.


The name make perfect sense when you think about it at first contact 
"summary" made me though about "hg summary" and I was a bit confused 
about what this could do. I wonder if we can have something better (but 
questions about desc seems more important first)




I'm torn on whether it should support case insensitivity, like 'desc()' and
'keyword()'.  It can be handy, but it seems like it would be surprising if
literals and regex are handled differently, or if the regex was adjusted to be
case insensitive internally.

The use case is to be able to select bug fixes out of the summary line, while
ignoring similar markers within the body of the commit.  I've seen previously
where the bug bot will erroneously update a bug that is mentioned as a
parenthetical within the detail of the commit message.  To illustrate on the
local Mercurial repo (some old markers have an internal space):

  $ hg log -r "desc(r' (issue')" -T "{rev}\n" | wc -l
  1672
  $ hg log -r "desc(r' (issue')" -T "{desc|firstline}\n"  \
  >| egrep '\(issue' | wc -l
  1633
  $ hg log -r "desc(r' (issue')" -T "{desc|firstline}\n"  \
  >| egrep '\(issue[0-9]+\)' | wc -l
  1560
  $ hg log -r "summary(r're: \(issue\d+\)')" -T "{rev}\n" | wc -l
  1560

I'm not sure why the splitlines() logic is that complicated, but I copy/pasted
from templatefilters.firstline() for consistency.

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2197,6 +2197,32 @@ def subrepo(repo, subset, x):

 return subset.filter(matches, condrepr=('', pat))

+@predicate('summary(string)', safe=True)
+def summary(repo, subset, x):
+"""Changesets with the given string in the summary line.  The match is case
+sensitive.
+
+If `value` starts with `re:`, the remainder of the value is treated as
+a regular expression. To match a value that actually starts with `re:`,
+use the prefix `literal:`.
+"""
+
+# i18n: "summary" is a keyword
+s = getstring(x, _("summary requires a string"))
+kind, value, matcher = _substringmatcher(s)
+
+def _matchvalue(r):
+_desc = repo[r].description()
+_sum = ''
+try:
+_sum = _desc.splitlines(True)[0].rstrip('\r\n')
+except IndexError:
+pass
+
+return matcher(_sum)
+
+return subset.filter(lambda r: _matchvalue(r), condrepr=('', 
s))
+
 def _substringmatcher(pattern):
 kind, pattern, matcher = util.stringmatcher(pattern)
 if kind == 'literal':
diff --git a/tests/test-commit.t b/tests/test-commit.t
--- a/tests/test-commit.t
+++ b/tests/test-commit.t
@@ -252,13 +252,50 @@ full log
   commit-foo-subdir


+  $ cat > ../commit-log-test < issue fix (bug1234)
+  >
+  > multiline description here
+  > EOF
+  $ echo "edited" >> foo/plain-file
+  $ hg ci -l ../commit-log-test
+
+Summary doesn't search past the first line
+  $ hg log -r "summary('description')"
+
+Summary works with regex
+  $ hg log -r "summary(r're:\(bug\d+\)')"
+  changeset:   2:5364ddbe8265
+  tag: tip
+  user:test
+  date:Thu Jan 01 00:00:00 1970 +
+  summary: issue fix (bug1234)
+
+
+Summary works with partial matches
+  $ hg log -r "summary('foo')"
+  changeset:   0:65d4e9386227
+  user:test
+  date:Thu Jan 01 00:00:00 1970 +
+  summary: commit-foo-subdir
+
+  changeset:   1:95b38e3a5b2e
+  user:test
+  date:Thu Jan 01 00:00:00 1970 +
+  summary: commit-foo-dot
+

 subdir log

   $ cd foo
   $ hg log .
+  changeset:   2:5364ddbe8265
+  tag: tip
+  user:test
+  date:Thu Jan 01 00:00:00 1970 +
+  summary: issue fix (bug1234)
+
   changeset:   1:95b38e3a5b2e
-  tag: tip
   user:test
   date:Thu Jan 01 00:00:00 1970 +
   summary: commit-foo-dot
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org