Re: Support character classes in glob authz rules

2018-12-03 Thread Julian Foad
Branko Čibej wrote:
> https://issues.apache.org/jira/browse/SVN-4204
> https://issues.apache.org/jira/browse/SVN-4795
> 
> Even though apr_fnmatch(), which we use for matching glob patterns in
> authz rules, supports character classes ([A-Z] etc.), we can't use them
> in the glob rules because of the way the config parser works. For
> example, the following rule:
> 
> [:glob:/**/*.[Dd]oc]
> 
> will be silently parsed to ":glob:/**/*.[Dd" [because] our config parser
> strictly follows the semantics of Python's ConfigParser and treats the
> first ']' it finds as the section terminator, ignoring any remaining
> characters to the end of the line.
> 
> The proposed patch changes this behaviour as follows:
> 
>   * the /last/ ']' in the line is the section terminator;
>   * only whitespace is allowed after the terminator to the end of the line.

Glad to see a proposal.

It makes me uncomfortable to depart from standard parsing. What if users are 
relying on Python ConfigParser or other compatible parsing as part of their 
Subversion authz infrastructure?

First I wondered if anything bad could happen if there's a silent change in 
meaning where a user has written, let's say,

> [:glob:/**/secret1]# was [:glob:/**/secret2]

It's hard to find any plausible example that would successfully parse and 
actually match something, but may be possible.

> The proposed change in the parser is only enabled for parsing authz and
> global group files, other Subversion configuration files will use the
> current semantics.

These sorts of quirks tend to make a system hard to maintain in the long run. 
What if we find another case where we'd like to depart from that parsing? How 
far would we go?

What alternative solution could we consider?

- Julian


Re: Support character classes in glob authz rules

2018-12-03 Thread Stefan Sperling
On Mon, Dec 03, 2018 at 08:15:24AM +, Julian Foad wrote:
> Branko Čibej wrote:
> > The proposed change in the parser is only enabled for parsing authz and
> > global group files, other Subversion configuration files will use the
> > current semantics.
> 
> These sorts of quirks tend to make a system hard to maintain in the long run. 
> What if we find another case where we'd like to depart from that parsing? How 
> far would we go?
> 
> What alternative solution could we consider?

Some sort of escaping mechanism might work.


Re: How to join in to dev@

2018-12-03 Thread Paul Hammant
Hi Nathan,

Some Apache mail lists won't like HTML email or "top-posting". I think the
subversion dev mail list is one of those, but I could be wrong.

You subscribe by emailing dev-subscr...@subversion.apache.org :)

On GitHub Pull-Requests - yes that'd be super useful. Julian has been
developing "shelve" and "checkpoint" which would doubtless be the basis for
light-weight pull-requests.  At least "patch sets" was in Google some 12-14
years ago (for their Perforce configuration of their monorepo -
https://trunkbaseddevelopment.com/game-changers/#mondrian-2006)

- Paul


Re: Support character classes in glob authz rules

2018-12-03 Thread Stefan Sperling
On Mon, Dec 03, 2018 at 09:30:53AM +0100, Stefan Sperling wrote:
> On Mon, Dec 03, 2018 at 08:15:24AM +, Julian Foad wrote:
> > Branko Čibej wrote:
> > > The proposed change in the parser is only enabled for parsing authz and
> > > global group files, other Subversion configuration files will use the
> > > current semantics.
> > 
> > These sorts of quirks tend to make a system hard to maintain in the long 
> > run. What if we find another case where we'd like to depart from that 
> > parsing? How far would we go?
> > 
> > What alternative solution could we consider?
> 
> Some sort of escaping mechanism might work.

See also https://issues.apache.org/jira/browse/SVN-4784

It would be nice if a solution to the authz issues would address
that issue as well.


Re: Support character classes in glob authz rules

2018-12-03 Thread Branko Čibej
On 03.12.2018 09:15, Julian Foad wrote:
> Branko Čibej wrote:
>> https://issues.apache.org/jira/browse/SVN-4204
>> https://issues.apache.org/jira/browse/SVN-4795
>>
>> Even though apr_fnmatch(), which we use for matching glob patterns in
>> authz rules, supports character classes ([A-Z] etc.), we can't use them
>> in the glob rules because of the way the config parser works. For
>> example, the following rule:
>>
>> [:glob:/**/*.[Dd]oc]
>>
>> will be silently parsed to ":glob:/**/*.[Dd" [because] our config parser
>> strictly follows the semantics of Python's ConfigParser and treats the
>> first ']' it finds as the section terminator, ignoring any remaining
>> characters to the end of the line.
>>
>> The proposed patch changes this behaviour as follows:
>>
>>   * the /last/ ']' in the line is the section terminator;
>>   * only whitespace is allowed after the terminator to the end of the line.
> Glad to see a proposal.
>
> It makes me uncomfortable to depart from standard parsing. What if users are 
> relying on Python ConfigParser or other compatible parsing as part of their 
> Subversion authz infrastructure?

Then they're not using character classes in glob patterns.

The problem is that ConfigParser does not define an escaping method for
the ], so we can't even adopt that in our parser — and if we did, then
existing rules could silently change their meaning in really nasty ways.

> First I wondered if anything bad could happen if there's a silent change in 
> meaning where a user has written, let's say,
>
>> [:glob:/**/secret1]# was [:glob:/**/secret2]

Yes, this is the sort of case where the meaning would change.

> It's hard to find any plausible example that would successfully parse and 
> actually match something, but may be possible.

If one stretches "plausible" far enough ...

>> The proposed change in the parser is only enabled for parsing authz and
>> global group files, other Subversion configuration files will use the
>> current semantics.
> These sorts of quirks tend to make a system hard to maintain in the long run. 
> What if we find another case where we'd like to depart from that parsing? How 
> far would we go?
>
> What alternative solution could we consider?

I can't think of a solution that would not depart from ConfigParser
semantics in one way or another. As for "the system" ... well, the fact
that config and authz parsing shares common code is really just an
accident, and certainly an implementation detail.

-- Brane



Re: Support character classes in glob authz rules

2018-12-03 Thread Branko Čibej
On 03.12.2018 09:39, Stefan Sperling wrote:
> On Mon, Dec 03, 2018 at 09:30:53AM +0100, Stefan Sperling wrote:
>> On Mon, Dec 03, 2018 at 08:15:24AM +, Julian Foad wrote:
>>> Branko Čibej wrote:
 The proposed change in the parser is only enabled for parsing authz and
 global group files, other Subversion configuration files will use the
 current semantics.
>>> These sorts of quirks tend to make a system hard to maintain in the long 
>>> run. What if we find another case where we'd like to depart from that 
>>> parsing? How far would we go?
>>>
>>> What alternative solution could we consider?
>> Some sort of escaping mechanism might work.
> See also https://issues.apache.org/jira/browse/SVN-4784
>
> It would be nice if a solution to the authz issues would address
> that issue as well.

"Some sort of excaping" also silently changes existing rules and, worse,
does it in far more subtle ways than my proposed change.

SVN-4784 is a different issue and should be solved in the auto-props
code; I added a proposal in Jira but haven't had time to bring it to
this list.

-- Brane


Re: Support character classes in glob authz rules

2018-12-03 Thread Julian Foad
Branko Čibej wrote:
> I can't think of a solution that would not depart from ConfigParser
> semantics in one way or another. As for "the system" ... well, the fact
> that config and authz parsing shares common code is really just an
> accident, and certainly an implementation detail.

OK, that's fair. Your solution does seem to be a backward-compatible way that 
practically should work well to cope with this case.

I was going to say "this new case", but the square-bracket problem is not new 
with glob rules: any filename can have square brackets in it, on Unix.

If we treat the authz format as its own format, that does make me wonder if 
there are any other limitations the current ConfigParser format is applying, 
that should also be lifted.

For example, if "@harry" is parsed as a reference to group name "harry", then 
is there also a way to specify a username that is literally "@harry"?

In general, is there a programmatic transformation from arbitrary valid user 
names and paths to corresponding authz entries?

-- 
- Julian


Re: Support character classes in glob authz rules

2018-12-03 Thread Branko Čibej
On 03.12.2018 10:39, Julian Foad wrote:
> Branko Čibej wrote:
>> I can't think of a solution that would not depart from ConfigParser
>> semantics in one way or another. As for "the system" ... well, the fact
>> that config and authz parsing shares common code is really just an
>> accident, and certainly an implementation detail.
> OK, that's fair. Your solution does seem to be a backward-compatible way that 
> practically should work well to cope with this case.
>
> I was going to say "this new case", but the square-bracket problem is not new 
> with glob rules: any filename can have square brackets in it, on Unix.


Yes, the original issue #4204 predates glob rules.


> If we treat the authz format as its own format, that does make me wonder if 
> there are any other limitations the current ConfigParser format is applying, 
> that should also be lifted.
>
> For example, if "@harry" is parsed as a reference to group name "harry", then 
> is there also a way to specify a username that is literally "@harry"?


Well, first of all, this has nothing to do with the ConfigParser, but
with how our authz infrastructure interprets the access entry selectors.
And no, there's no way to do this.


> In general, is there a programmatic transformation from arbitrary valid user 
> names and paths to corresponding authz entries?

Define "arbitrary valid" and I'll answer that. :)


We've always had the following restictions: usernames can't begin with
'@' or '&' or '~', and similar limitations apply to group and alias
names. Such identifiers have always been rejected at one stage or
another, and this has not been a problem.

Paths in rules are interpreted literally (either as literal paths or as
literal glob patterns), without any excaping mechanism. This hasn't
changed since the inception of authz rules, either. The only really
problematic case is having ']' in the rule, and with glob patterns, this
becomes more likely (or desired).

-- Brane


Re: Support character classes in glob authz rules

2018-12-03 Thread Julian Foad
Branko Čibej wrote:
>> If we treat the authz format as its own format, that does make me wonder if 
>> there are any other limitations the current ConfigParser format is applying, 
>> that should also be lifted.
>>
>> For example, if "@harry" is parsed as a reference to group name "harry", 
>> then is there also a way to specify a username that is literally "@harry"?
> 
> Well, first of all, this has nothing to do with the ConfigParser, but
> with how our authz infrastructure interprets the access entry selectors.

Accepted.

> And no, there's no way to do this.
> > In general, is there a programmatic transformation from arbitrary valid 
> > user names and paths to corresponding authz entries?
> 
> Define "arbitrary valid" and I'll answer that. :)

Ones that Subversion otherwise accepts, apart from in this context.

> We've always had the following restictions: usernames can't begin with
> '@' or '&' or '~',

There must be also limitations with '#', '*', '=' and whitespace characters, 
and I don't see a way to find a complete list.

> and similar limitations apply to group and alias
> names. Such identifiers have always been rejected at one stage or
> another, and this has not been a problem.

We haven't heard vocal complaints. It could well have caused headaches for 
those implementing authz in their systems. (Not because they need to use such 
usernames, but because they need to defensively program against an incompletely 
known set of problems.)

Would I be totally off the mark in suggesting an enhancement request for an 
authz file format that addresses these issues? It seems to me this is the sort 
of thing "enterprise" users would value.

-- 
- Julian


Re: Support character classes in glob authz rules

2018-12-03 Thread Michael Pilato
On 12/3/18 3:15 AM, Julian Foad wrote:
> It makes me uncomfortable to depart from standard parsing. What if
> users are relying on Python ConfigParser or other compatible parsing
> as part of their Subversion authz infrastructure?

We needn't keep this hypothetical.  ViewVC is using (a slightly
modified[*]) Python ConfigParser in this way.

-- Mike

[*] By default, ConfigParser (well, really the RawConfigParser it
subclasses) lowercases option names, which can cause username/group
matching to fail.  So ViewVC's code replaces the optionxform method of
ConfigParser with a noop lambda function.  (See
https://docs.python.org/2/library/configparser.html#ConfigParser.RawConfigParser.optionxform
for official docs.)


Re: Support character classes in glob authz rules

2018-12-03 Thread Branko Čibej
On 03.12.2018 11:42, Julian Foad wrote:
> Branko Čibej wrote:
>>> If we treat the authz format as its own format, that does make me wonder if 
>>> there are any other limitations the current ConfigParser format is 
>>> applying, that should also be lifted.
>>>
>>> For example, if "@harry" is parsed as a reference to group name "harry", 
>>> then is there also a way to specify a username that is literally "@harry"?
>> Well, first of all, this has nothing to do with the ConfigParser, but
>> with how our authz infrastructure interprets the access entry selectors.
> Accepted.
>
>> And no, there's no way to do this.
>>> In general, is there a programmatic transformation from arbitrary valid 
>>> user names and paths to corresponding authz entries?
>> Define "arbitrary valid" and I'll answer that. :)
> Ones that Subversion otherwise accepts, apart from in this context.
>
>> We've always had the following restictions: usernames can't begin with
>> '@' or '&' or '~',
> There must be also limitations with '#', '*', '=' and whitespace characters, 
> and I don't see a way to find a complete list.

True. Also '$' since we have the magic '$authenticated' and '$anonymous'
tokens.

The rules are not explicitly documented anywhere, but they're implied by
the documentation in The Book. With a bit of luck, I'll have these
spelled out on the wiki some day.

>> and similar limitations apply to group and alias
>> names. Such identifiers have always been rejected at one stage or
>> another, and this has not been a problem.
> We haven't heard vocal complaints. It could well have caused headaches for 
> those implementing authz in their systems. (Not because they need to use such 
> usernames, but because they need to defensively program against an 
> incompletely known set of problems.)

Well, realistically, user identifiers are likely to take one of the
following forms, with reasonable variations:

  * usernme
  * name.surname
  * usern...@example.com
  * CN=Name Surname,OU=marketing,DC=example,DC=com


While some authentication systems (pam_unix FTW) theoretically allow
funny usernames with leading '@', '&', '$' etc., such names are very,
very unlikely to be used in practice. Our authz parser will accept any
of the forms I listed above.

(The Distinguished Name has to be mapped through an alias because it
containe '=', this is documented in The Book.)

> Would I be totally off the mark in suggesting an enhancement request for an 
> authz file format that addresses these issues? It seems to me this is the 
> sort of thing "enterprise" users would value.

I don't think it's necessary. And until and unless we get an actual bug
report, I'd leave well enough alone. Inventing some escaping mechanism
post-hoc always leads to strange side effects, making some currently
valid identifiers suddenly invalid or interpreted differently.

-- Brane

P.S.: Users can't have ':' in the name of a repository for
repository-specific rules, either; this, too have never been a problem.



Re: Support character classes in glob authz rules

2018-12-03 Thread Branko Čibej
On 03.12.2018 16:07, Michael Pilato wrote:
> On 12/3/18 3:15 AM, Julian Foad wrote:
>> It makes me uncomfortable to depart from standard parsing. What if
>> users are relying on Python ConfigParser or other compatible parsing
>> as part of their Subversion authz infrastructure?
> We needn't keep this hypothetical.  ViewVC is using (a slightly
> modified[*]) Python ConfigParser in this way.
>
> -- Mike
>
> [*] By default, ConfigParser (well, really the RawConfigParser it
> subclasses) lowercases option names, which can cause username/group
> matching to fail.  So ViewVC's code replaces the optionxform method of
> ConfigParser with a noop lambda function.  (See
> https://docs.python.org/2/library/configparser.html#ConfigParser.RawConfigParser.optionxform
> for official docs.)


Interesting. Of course, our ConfigParser-like implementation already has
the option to treat section names as case-sensitive, and this option is
used ... that's righjt, by the authz file parser.

-- Brane



Re: Crazy idea: changes in WC should share an API with changes in repository

2018-12-03 Thread Julian Foad
Julian Foad wrote on 2018-11-09:
> > Implementation of the "WC delta editor" is in progress.
> 
> Currently factoring out our "copy dir from repos to WC" implementation 

Some more notes on progress.

The way we handle "copy" into the WC is a beast. Untangling this is by far the 
most complex part of the whole exercise.


* Copy via Checkout

Copying a directory tree from the same repository into the WC is currently 
handled by performing a new checkout into a temporary WC, then running a 
WC-to-WC-copy from there to the target location, then deleting the remnants of 
that temporary WC. Ugh.

The essential purpose of the WC-to-WC-copy step is putting down a layer in the 
WC 'working' table to represent the pristine version (aka 'revert-base') of 
this copy.

The temporary WC, ugly though it is, is a fairly well hidden implementation 
detail and so not too worrying.

What is ugly is the way this procedure calls back to the RA layer using a 
full-blown 'checkout' procedure, scanning the WC, setting up a reporter, and 
using it to drive a new 'update' edit.

Instead, at this level in our WC editor, we already know there's an empty spot 
in the WC where the result is going to go, and we know the 'report' is of the 
trivial 'give me the whole subtree URL@REV' variety. We should be able to make 
a callback that doesn't require access to the WC nor the repository but just to 
whatever datastore the caller has available.  The driver of this WC edit might 
be some shelving storage or might be another WC.

Requiring a callback to the RA layer connecting to The Repository might be a 
workable initial version, even though ultimately we would certainly not want to 
require repository access for shelving or unshelving or WC-to-WC copying.


* Single-file vs. Directory

Need to unify. Handling these two cases separately is the cause of a lot of 
complexity duplication and resulting bugs.

Potential solution: Some form of the old "anchor-and-target" idea. Create a 
generic editor wrapper that transforms a request for an editor rooted at a file 
(or a maybe-file) into a request for an editor rooted at its parent directory 
with operations performed on a single target entry.


* Foreign-repo copies:

Currently a separate code path. (Currently the only one that calls the new WC 
editor.)

Unify.


* Externals

r1847834: "Unify how 'copy' processes externals with and without pinning. ... 
Remove the optional 'externals' processing from inside the copy APIs, as there 
was already support for doing it outside. Previously, externals were fetched 
outside the 'copy' API if and only if some externals were to be pinned. Now we 
always use that code path. As a side effect, this makes the notifications 
consistent between the two cases."


* Mergeinfo

Mergeinfo is deleted when we are copying from a foreign repo. (Inconsistency 
bug found and fixed: SVN-4792.)

This should be re-implemented as a wrapper editor that just performs mergeinfo 
stripping.



-- 
- Julian