date:20120130

Re: request to clarify and improve Subversion property name specification

2012-01-30 Thread Philip Martin

Daniel Shahaf danie...@elego.de writes:

 - Send a patch to svn_repos__validate_props() (and make your case that
   it should be applied)

I think the current situation for property names is:

 - the backend FS layer allows any null terminated string as a property
   name

 - the frontend client layer restricts property names to a subset of
   ASCII

I'm still not clear whether Garret wants to relax the client
restrictions, or tighten the server restictions, or do both.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: request to clarify and improve Subversion property name specification

2012-01-30 Thread Branko Čibej

On 30.01.2012 11:14, Philip Martin wrote:
 Daniel Shahaf danie...@elego.de writes:

 - Send a patch to svn_repos__validate_props() (and make your case that
   it should be applied)
 I think the current situation for property names is:

  - the backend FS layer allows any null terminated string as a property
name

  - the frontend client layer restricts property names to a subset of
ASCII

And the HTTP layer has its own implicit restrictions.

 I'm still not clear whether Garret wants to relax the client
 restrictions, or tighten the server restictions, or do both.

It's always a good idea to have the server validate using the same rules
as the client libs. Not exactly backward-compatible, but I consider it
bug that the server allows property names that the client does not.

-- Brane

Re: request to clarify and improve Subversion property name specification

2012-01-30 Thread Philip Martin

Branko Čibej br...@apache.org writes:

 On 30.01.2012 11:14, Philip Martin wrote:
  - the backend FS layer allows any null terminated string as a property
name

  - the frontend client layer restricts property names to a subset of
ASCII

 And the HTTP layer has its own implicit restrictions.

The property name gets transferred as an XML name but it appears that
the server does some escaping to allow non-XML-name characters.  If I
use 'svnadmin load' to set a property with a name 'pp' then I can
still checkout over http, the XML sent over the wire is:

S:set-prop name=plt;gt;p/S:set-prop

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: request to clarify and improve Subversion property name specification

2012-01-30 Thread Philip Martin

Philip Martin philip.mar...@wandisco.com writes:

 Branko Čibej br...@apache.org writes:

 On 30.01.2012 11:14, Philip Martin wrote:
  - the backend FS layer allows any null terminated string as a property
name

  - the frontend client layer restricts property names to a subset of
ASCII

 And the HTTP layer has its own implicit restrictions.

 The property name gets transferred as an XML name but it appears that
 the server does some escaping to allow non-XML-name characters.  If I
 use 'svnadmin load' to set a property with a name 'pp' then I can
 still checkout over http, the XML sent over the wire is:

 S:set-prop name=plt;gt;p/S:set-prop

That allows the client to receive the property from the server.  The
client doesn't allow me to manipulate the property and is not capable to
sending it back to the server.  If I use sqlite3 to effect a local
property change the client attempts to send the unescaped name to the
server:

D:setD:propC:pp v2v2/C:pp/D:prop/D:set

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: request to clarify and improve Subversion property name specification

2012-01-30 Thread Branko Čibej

On 30.01.2012 12:06, Philip Martin wrote:
 Philip Martin philip.mar...@wandisco.com writes:

 Branko Čibej br...@apache.org writes:

 On 30.01.2012 11:14, Philip Martin wrote:
  - the backend FS layer allows any null terminated string as a property
name

  - the frontend client layer restricts property names to a subset of
ASCII
 And the HTTP layer has its own implicit restrictions.
 The property name gets transferred as an XML name but it appears that
 the server does some escaping to allow non-XML-name characters.  If I
 use 'svnadmin load' to set a property with a name 'pp' then I can
 still checkout over http, the XML sent over the wire is:

 S:set-prop name=plt;gt;p/S:set-prop
 That allows the client to receive the property from the server.  The
 client doesn't allow me to manipulate the property and is not capable to
 sending it back to the server.  If I use sqlite3 to effect a local
 property change the client attempts to send the unescaped name to the
 server:

 D:setD:propC:pp v2v2/C:pp/D:prop/D:set

QED :)

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling

On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote:
 Hi folks!
 
 I read the note about unicode compositions for filenames
 http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
 and would like to drive the discussion.

Hi,

I am very happy to hear that you want to work towards getting this
problem fixed. Thank you for your help!

I've just re-read the unicode-composition-for-filenames notes.
I think they are a bit outdated. For instance, they still talk about
the 1.6 working copy format. They also don't clearly explain the problems
with backwards compatibility we're facing here.

We won't be able to apply your patch as it is. The problem is that
it can break operation for some existing repositories and working
copies.

Generally, I think that writing code that implements a solution for
this problem is not hard, no matter what the solution is.
The real challenge lies in finding a solution that is backwards
compatible with existing repositories and working copies.

I will explain what I mean by giving examples below.
But first, let's recap the basic problem, if only so others can more
easily follow this discussion.

As you know, in Unicode, some characters can be represented in two distinct
ways: pre-composed form (NFC) and de-composed form (NFD).
For instance, the letter ä (a umlaut) can be represented by Unicode
code point 0x00E4 ( ä ), which is the pre-composed form, or by code
point 0x0061 ( a ) followed by code point 0x0308 ( ̈  ), which is the
de-composed form.

This is a basic property of Unicode. It simply contains both ways of
representing these characters in its character tables.
I.e. any byte-string representation of Unicode, be it UTF-8, UTF-16,
must also be able to represent both ways of encoding such characters.
So when filenames are given in Unicode, a filename may contain any
combination of NFC and NFD characters.

Because Subversion never normalises filenames to one form or the other,
the space of all possible filenames in a Subversion repository or working
copy contains a large amount of redundancy. There are many filenames which
look the same to the user but differ in terms of the Unicode code points
used to represent them.

For instance, imagine a filename containing 3 a umlaut characters
and otherwise only characters from the ASCII set.
There are 8 (2^3) different ways of representing this filename in Unicode,
and hence 8 different UTF-8 byte strings which can be used in the repository
or working copy to represent what is, from the user's point of view,
the same filename.

The problem we have on Mac OS X is that when we write any of these
8 different byte strings to the filesystem to name the file, and later
read the filename back from the filesystem (e.g. by opening the parent
directory and asking for a list of files it contains), we will always
receive the name with all a umlaut characters expanded to de-composed
form.

Now, in the working copy meta data (.svn/wc.db) we can use any of 8 forms
of the filename. If we don't use NFC for all characters in the filename,
the filename read from disk may fail to match any name stored in meta data.

Let's simplify the discussion a bit by assuming only two possible ways
of encoding a filename: One with all characters normalised to NFC, and
one with all characters normalised to NFD. We don't really need to
consider the mixed forms for the purpose of this discussion (though it
helps to keep in mind that they exist).

So let's talk about what would happen if we applied your patch.

Let's say I have a working copy which contains filenames normalised
to NFD, as is the case on Mac OS X. The server gets upgraded to a new
release of Subversion which contains your patch. This means the server
will now send all paths as NFC. Let's say there are changes made to a
file which has 3 a umlaut characters in its name. When I run 'svn update'
my client will try to find the NFC form of the name in its meta-data,
and fail to locate it because the file was stored as NFD.

So this means your patch will break compatibility with the working copy.
Therefore, we would need to provide an upgrade path for those working
copies. E.g. 'svn upgrade' could be modified to normalise all filenames
stored in the DB to NFC. Problem solved.

But now comes the next problem. Given a filename in NFC which we read from
meta data, how can we locate the corresponding on-disk file if its form
is not NFC? We could of course rename the on-disk file. Except this
won't work on Mac OS X unless we decide to use NFD encoding. So we could
decide to also use NFD everywhere -- but this would break as soon as
some other operating system decides to normalise to NFC, so it's not a
good solution. We could also open the parent directory, read all the
filenames within it, normalise them all, and then search the resulting
list. This works, expect if a name exists twice, once in NFC form and once
in NFD form. We'd somehow have to solve the name collision in

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej

On 30.01.2012 13:30, Stefan Sperling wrote:
 On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote:
 Hi folks!

 I read the note about unicode compositions for filenames
 http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
 and would like to drive the discussion.
 Hi,

 I am very happy to hear that you want to work towards getting this
 problem fixed. Thank you for your help!

 I've just re-read the unicode-composition-for-filenames notes.
 I think they are a bit outdated. For instance, they still talk about
 the 1.6 working copy format. They also don't clearly explain the problems
 with backwards compatibility we're facing here.

[...]

We have to track two distinct normalizations, the internal (wc.db,
repos) form, most likely NFC, and the working copy, on-disk form. This
last will depend on the host system; most likely NFD on Mac OS and NFC
everywhere else. The on-disk normalization needs to happen before
conversion to the system encoding, of course.

libsvn_repos should do its own normalization to NFC because we can't
trust old clients to do it right.
Doing a dump/reload cycle should then be sufficient to upgrade the
repository, and probably the only viable one, too.

For working copies, we may want to teach svn upgrade to do the on-disk
and wc.db normalization dance. Clearly, client-side normalization
requires a WC format bump, but it need not be automatic.

We should probably give serious thought to using the restricted
normalisation forms (NFKC and NFKD) and tell people who want proper
Unicode Roman numerals in their file names to think again. :)

-- Brane

Re: request to clarify and improve Subversion property name specification

2012-01-30 Thread Philip Martin

Philip Martin philip.mar...@wandisco.com writes:

 Branko Čibej br...@apache.org writes:

 On 30.01.2012 11:14, Philip Martin wrote:
  - the backend FS layer allows any null terminated string as a property
name

  - the frontend client layer restricts property names to a subset of
ASCII

 And the HTTP layer has its own implicit restrictions.

 The property name gets transferred as an XML name but it appears that
 the server does some escaping to allow non-XML-name characters.  If I
 use 'svnadmin load' to set a property with a name 'pp' then I can
 still checkout over http, the XML sent over the wire is:

 S:set-prop name=plt;gt;p/S:set-prop

That's for neon, when using serf the checkout fails:

svn: E175009: XML parsing failed: (207 Multi-Status)

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

AW: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Markus Schaber

Hi,

Von: Stefan Sperling [mailto:s...@elego.de] 
 On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote:
 I read the note about unicode compositions for filenames 
 http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
  and would like to drive the discussion.
[...]
 We could also open the parent directory, read all the filenames within it, 
 normalise them all, and then search the resulting list. This works, expect if 
 a name exists twice, once in NFC form and once in NFD form. We'd somehow have 
 to solve the name collision in the filesystem.

This sounds astonishingly similar to the lower/upper case problem of UN*X vs. 
Mac/Win.

 But it gets worse. Recall the filesystem name collision problem mentioned 
 above. This problem can also happen in the repository filesystem! For 
 instance, assume that in the repository there already exist two filenames, 
 one NFD, the other NFC, but they both are actually the same name.

The same here. So whatever solution is found for one of those problems could 
also help to solve (or mitigate) the other problem.

 These are the questions which we'll need to answer to solve this issue.
 I honestly do not have good answers. I hope that you will find ways of 
 solving these problems.

Maybe the best solution to this issue is a client-only solution, in a similar 
way the case sensitivity problem is tackled.


Best regards

Markus Schaber
-- 
___
We software Automation.

3S-Smart Software Solutions GmbH
Markus Schaber | Developer
Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax 
+49-831-54031-50

Email: m.scha...@3s-software.com | Web: http://www.3s-software.com 
CoDeSys internet forum: http://forum.3s-software.com
Download CoDeSys sample projects: 
http://www.3s-software.com/index.shtml?sample_projects

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915 


-Ursprüngliche Nachricht-

Re: Implicit keep-alive after reintegrate merge

2012-01-30 Thread Stefan Sperling

On Tue, Jan 24, 2012 at 01:12:39AM +0100, Branko Čibej wrote:
 By the way, I read Stefan's description of why --reintegrate is
 necessary, and after slogging through the unfortunate terminology (2-URL
 merge doesn't mean a thing in CM theory :) and one little bit caught my
 attention:
 
  A sync merge can fill in the all parameters as well, except PATH2.
  However, it needs to do so in a different way. With a sync merge
  PATH1 and PATH2 are the same
 
 I keep reading this in the context of the rest of the reasoning, any my
 reaction is still: WTF? Bogus! This looks like someone /started off/
 with the assumption that a sync merge can take shortcuts where a
 reintegrate merge cannot; but, so sorry, that's just plain nonsense.

Oh, it's not nonsense. And there are no special shortcuts reintegrate
can take. You just misunderstood what I was writing about.

I didn't write about CM theory. I wrote about usage of Subversion.

When using svn, the term 2-URL merge refers to a specific way of
invoking 'svn merge'. It is the most general type of invocation.
All other forms are syntactic sugar which can be represented by
equivalent 2-URL merge invocations.

Consider: svn merge ^/trunk
with mergeinfo on the current dir being: /trunk:2-6
The following 2-URL merge is the equivalent: 
 svn merge ^/trunk@6 ^/trunk@HEAD .
That's all there is to it.

The same applies to reintegrate, BTW. It is a Subversion-specific
concept that might not be represented in CM theory because it is, as you
point out, just a special case of the general merge (you didn't describe
what merge means in your theory so I'm just going to make assumptions).

 The cases are exactly symmetrical, all edge cases apply to both directions
 of the merge, a sync merge can encounter all the complications of a
 reintegrate merge. I'll be bold enough to assume that the keep-alive
 song-and-dance is a direct result of these invalid assumptions.
 
 Well, at least this answers the question of whether it's the model or
 the implementation that's wrong ... the answer is, that the
 implementation is misinterpreting the model. :)

Huh? I don't follow. Which model do you think is being misinterpreted?
Does the model you have in mind cleanly map to what Subversion can
represent?

 Just to make sure it's understood: When you create a branch, the origin
 of the branch is an interesting bit of information. However, for
 merging, it is entirely irrelevant if branch A was created from B or the
 other way around. To illustrate:
 
 (1)
+- b@r2  b@r3 
  (branch) /  | (merge)
  /   v
--- a@r1 -+- a@r4 
 
 (2)
--- a@r1 --- a@r3 
  \   | (merge)
  (branch) \  v
+- b@r2 --+- b@r4 
 
 
 Cases (1) and (2) are exactly equivalent as far as the merge algorithm
 is concerned, but Subversion calls the first a reintegrate merge and the
 second a sync merge, and treats them differently, as if branch (a) were
 somehow special. It's not.

If you always use the 2-URL merge syntax all the abstractions go away
and you'll have symmetry.

 (1) svn co a@r4 wc; svn merge b@r2 b@r3 a
 (2) svn co b@r4 wc; svn merge a@r1 a@r3 b

See? Perfectly symmetrical.

Your example is too simple, though.
You only have one change being merged either way, and no cycles.

Generally, we want to avoid spurious conflicts from diff3 which happen
when changes are applied twice because diff3 is not idempotent.
I.e. we break the nice symmetry to work around a limitation of diff3.

In the following case we can avoid spurious conflicts by picking
our parameters carefully:

 (3)
+-b@r2--+ b@r3--b@r4-b@r5 
  (branch) /^ | (merge 2)
  / | (merge 1)   v
--- a@r1 --a@r2---+- a@r6 

Merge 1 brings a@r2 into b@r2.
Merge 2 brings b@r4 into a@r5.

 (3.1) svn co b@r2 wc; svn merge a@r1 a@r2 b

There are two ways of performing merge 2.
The first is symmetrical and re-applies a@r2 to a@r6, via b@r3,
with possible spurious conflicts from diff3:

 (3.2 a) svn co a@r5 wc; svn merge b@r2 b@r5 a

The second does not re-apply a@r2, so there are no possible conflicts
from diff3 because of a@r2/b@r3. Only b@r4 can conflict.

 (3.2 b) svn co a@r5 wc; svn merge b@r3 b@r5 a

The result is the same, however.

What we use during --reintegrate is (3.2 b).
You can argue that this approach is broken and we should be using (3.2 a)
for symmetry, and let users deal with spurious conflicts.

But (3.2 b) is always correct and more convenient if diff3 fails to
produce a conflict-free diff when b@r3 is applied to a@r5.
So why not use it?

Alternatively, do you know of a diff3 replacement that is idempotent?

Re: Implicit keep-alive after reintegrate merge

2012-01-30 Thread Stefan Sperling

On Mon, Jan 30, 2012 at 02:23:46PM +0100, Stefan Sperling wrote:
  (3)
 +-b@r2--+ b@r3--b@r4-b@r5 
   (branch) /^ | (merge 2)
   / | (merge 1)   v
 --- a@r1 --a@r2---+- a@r6 
 
 Merge 1 brings a@r2 into b@r2.
 Merge 2 brings b@r4 into a@r5.

Hmpf. I tweaked this before hitting 'send' and some of the numbers
are off. You get the idea:

  (3)
 +-b@r2--+ b@r4--b@r5-b@r6 
   (branch) /^ | (merge 2)
   / | (merge 1)   v
 --- a@r1 --a@r3---+- a@r7

Re: Implicit keep-alive after reintegrate merge

2012-01-30 Thread Stefan Sperling

On Mon, Jan 30, 2012 at 02:23:46PM +0100, Stefan Sperling wrote:
 What we use during --reintegrate is (3.2 b).

And here I'm catching myself spreading misinformation again.

There is another tweak we use during reintegrate.
Consider the graph again (fixed version):

  (3)
 +-b@r2--+ b@r4--b@r5-b@r6 
   (branch) /^ | (merge 2)
   / | (merge 1)   v
 --- a@r1 --a@r3---+- a@r7 

If we used:

  (3.2 b) svn co a@r6 wc; svn merge b@r4 b@r7 a
 
we'd miss b@r2 during the merge. It won't be applied to branch a.
But we want it in the changeset.

So what really happens is:

  (3.2 b) svn co a@r6 wc; svn merge a@r4 b@r6 a

Note that a is compared to b, rather than b against its former self.
This delta includes b@r2 because this change isn't yet on branch a.

I'll readily admit that my initial statement about reintegrate
taking no shortcuts may not be correct, depending on the definition
of shortcut. However, this is all about driving diff3 in a way that
produces results without spurious conflicts, rather than a general
mistake in applying some CM theory merge model.

Re: [RFC] Server Dictated Configuration

2012-01-30 Thread C. Michael Pilato

On 01/27/2012 04:38 PM, Paul Burba wrote:
 Now let's say we implement inheritable properties as I described in
 the wiki and want to use an inheritable property to supplement the
 existing mechanisms for svn:ignores/global-ignores.  Isn't that as
 simple as this?
 
 4) We add a new reserved inheritable property svn:i:ignores which has
 the exact same format as svn:ignore.
 
 5) Again assuming a WC operation, we take a path's inherited (or
 explicit) svn:i:ignores property value, the svn:ignore property (if
 any) on a path's parent directory, and the global-ignores runtime
 config value and append all three together to get the final answer on
 what to ignore.
 
 I take it you view this as insufficient?

The question wasn't really aimed my way, but this seems perfectly sufficient
to me.

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature

Re: request to clarify and improve Subversion property name specification

2012-01-30 Thread Stefan Sperling

On Mon, Jan 30, 2012 at 06:31:19AM -0800, Garret Wilson wrote:
 Think about it in terms of XML: There is a specification for the XML
 API, the DOM: http://www.w3.org/TR/DOM-Level-3-Core/core.html .
 However, the API specification still depends on the definition of
 what XML itself is (e.g. what constitutes an XML name, what types of
 nodes there are, what types of children nodes can have):
 http://www.w3.org/TR/REC-xml/ . Many people I've communicated with
 in the Subversion community seem to think, by way of analogy, that
 the DOM specification (just in code comments) is all that's
 needed---there's no point in taking the time to write the
 specification for XML itself. I wholeheartedly disagree.

It may seem like that if all you look at is the code and this list.
It seems you haven't seen some of the files in this folder yet:
  https://svn.apache.org/repos/asf/subversion/trunk/notes
Or the acknowledgements sections of webdav and deltav RFCs.

Granted, we might not have a precise spec for properties names.
I agree very much that it would be useful to have one.

I guess no spec exists because property names of all things appear as
something quite simple and straightforward. The complexities aren't
readily apparent until you run into them. You have the benefit of hindsight.

Re: Implicit keep-alive after reintegrate merge

2012-01-30 Thread Mark Mielke

Stefan: I believe you are agreeing that the merge in either direction is 
the same complexity, and describing how --reintegrate moves the 
responsibility for the complexity to the owner of the private branch, 
and requires resolution before submission. I think you are saying this 
is a good thing because diff3 isn't perfect.


In my experience:

No merge is perfect. The situation is either complex, or it is not 
complex - and moving resolution to the private branch is a matter of 
process - not a matter of algorithm. That is, it is the responsibility 
of the team to decide that we will always make sure our private branch 
is up to date before submitting to the integration stream.


In particular, if I have a stream with 100 users working in parallel, 
all submitting on a regular basis because this is their full time paid 
job to work on a piece of software, it may be a race to actually get the 
submission - depending on if the algorithm can detect whether the same 
files are being changed or not.


The first thing the tool can do to be genuinely useful in this 
situation, is to accept some of the responsibility of detecting whether 
or not the race is one of these diff3 is not idempotent situations, 
and providing automatic handling. If the case has been hit, then 
--reintegrate could be used as a form of special error checking where 
it does the same as merge, except in the case that the merge has a 
true conflict with any particular element of the change set (as opposed 
to a potential conflict with the end result), where the results of diff3 
would need to be trusted, then it could bail and provide the user with 
the information required to resolve the conflict locally before submission.


The second thing the tool can do to be genuinely useful in this 
situation, is to allow for this check to be overridden. If I didn't 
trust diff3 - I wouldn't use merges at all. Sometimes a source 
management tool just needs to help me resolve conflicts. Especially with 
merge tracking and intelligent designer workflows, many cases of so 
called conflicts touch unrelated lines of code, and it *is* safe to 
complete the merge, even to the integration stream. I should have the 
ability to choose to do this, rather than race for submission with 100 
other users.


The worst thing the tool can do is to declare that diff3 is idempotent 
therefore it should be disabled during --reintegrate. Yuck. This is a 
partial solution and at least as I understand it - it is even dangerous. 
What happens if I use --reintegrate in a situation that actually does 
require merge resolution? Will every situation be blocked? Or will it 
take --reintegrate as a license to overwrite results, trusting that I 
can do all the necessary conflict checking myself? I have seen nothing 
so far that allows me to conclude that architecturally, Subversion 
requires the --reintegrate behaviour. It's a short cut in providing a 
complete branch merging solution for users of the system. Somebody 
started work on the canvas, and then drafted in the last corner rather 
than finish it. :-)


Cheers,
mark


On 01/30/2012 08:23 AM, Stefan Sperling wrote:
The same applies to reintegrate, BTW. It is a Subversion-specific 
concept that might not be represented in CM theory because it is, as 
you point out, just a special case of the general merge (you didn't 
describe what merge means in your theory so I'm just going to make 
assumptions).



Just to make sure it's understood: When you create a branch, the origin
of the branch is an interesting bit of information. However, for
merging, it is entirely irrelevant if branch A was created from B or the
other way around. To illustrate:

 (1)
+- b@r2  b@r3 
  (branch) /  | (merge)
  /   v
--- a@r1 -+- a@r4 

 (2)
--- a@r1 --- a@r3 
  \   | (merge)
  (branch) \  v
+- b@r2 --+- b@r4 


Cases (1) and (2) are exactly equivalent as far as the merge algorithm
is concerned, but Subversion calls the first a reintegrate merge and the
second a sync merge, and treats them differently, as if branch (a) were
somehow special. It's not.

If you always use the 2-URL merge syntax all the abstractions go away
and you'll have symmetry.

  (1) svn co a@r4 wc; svn merge b@r2 b@r3 a
  (2) svn co b@r4 wc; svn merge a@r1 a@r3 b

See? Perfectly symmetrical.

Your example is too simple, though.
You only have one change being merged either way, and no cycles.

Generally, we want to avoid spurious conflicts from diff3 which happen
when changes are applied twice because diff3 is not idempotent.
I.e. we break the nice symmetry to work around a limitation of diff3.

In the following case we can avoid spurious conflicts by picking
our parameters carefully:

  (3)
 +-b@r2--+ b@r3--b@r4-b@r5 
   (branch) /^ | (merge 2)

Re: request to clarify and improve Subversion property name specification

2012-01-30 Thread C. Michael Pilato

On 01/29/2012 02:14 PM, Garret Wilson wrote:
 On 1/29/2012 10:55 AM, Branko Čibej wrote:
 ... I can't help wondering why you didn't ask about valid property names
 /before/ you created a bunch of invalid ones. Sounds like you made one too
 many assumption. 
 
 Wait, seriously? You're saying that, whenever there is an API call and I
 pass something to it and it comes back with no errors, that nevertheless I
 should spend days asking on various lists just to make sure that the values
 I sent to the API really was OK?? Surely you jest.
 
 The appropriate thing to do would be to consult the Subversion
 specification. But there is no such specification.

You're right, Garret, there is no specification.  There is, however, a book.
 The original version of the book (finished in 2004) contained the following
in the second paragraph of the Properties section:

Generally speaking, the names and values of the properties can be whatever
you want them to be, with the constraint that the names must be
human-readable text.[1]

That statement remained unchanged in subsequent book versions until the 1.5
version (which came out in 2008), at which time we tightened up the
definition of human-readable text a bit:

Generally speaking, the names and values of the properties can be whatever
you want them to be, with the constraint that the names must contain only
ASCII characters.[2]

Now, even that most recent definition isn't quite good enough.  After all,
newlines and tab spaces are ASCII characters, too.  But I certainly don't
see how either of the existing forms of this statement could be construed to
indicate that non-ASCII middle-dot characters in property names were an
intentionally supported use-case.  :-)

Still, you've made some very valid points in this thread, specifically as
regards how we are perhaps overly strict in some areas of our property
requirements and not consistently strict enough elsewhere.  I've filed book
issue #157[3] toward improving further still the book's documentation of
these requirements.

I hope your participation in this thread hasn't soured your appetite for
Subversion or its improvement.

-- C-Mike

[1] http://svnbook.red-bean.com/en/1.0/ch07s02.html
[2] http://svnbook.red-bean.com/en/1.7/svn.advanced.props.html
[3] http://code.google.com/p/svnbook/issues/detail?id=157


-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Peter Samuelson


  [Stefan Sperling]
  We could also open the parent directory, read all the filenames
  within it, normalise them all, and then search the resulting
  list. This works, expect if a name exists twice, once in NFC form
  and once in NFD form. We'd somehow have to solve the name collision
  in the filesystem.

[Markus Schaber]
 This sounds astonishingly similar to the lower/upper case problem of
 UN*X vs. Mac/Win.

There are similarities, but there are some important differences:

- We have to support Mac OS X, which stores all files in NFD.  In the
  upper/lowercase analogy, think of OS X as MS-DOS, which does not
  preserve mixed case at all but always represents files in uppercase.
  Subversion doesn't support MS-DOS and I hope we never need to.  MS
  Windows, OTOH, at least preserves the upper/lowercase distinction
  presented to it when you create a file.  Big difference.

  (I'm not saying OS X is like MS-DOS in other respects.  Just for the
  purpose of the NFC/NFD vs. upper/lower analogy.)

- Also, the Subversion platform has chosen to support files like README
  and Readme that conflict on Windows.  Our reasoning is if you have
  users on Windows, don't do that.  Most solutions to the NFC/NFD
  problem will affect all platforms, not just one, and we probably
  can't just say well, don't do that - we'll need to actually prevent
  it - and somehow deal with existing clients, WCs, and repositories).

Because of those differences, my gut feeling is that we can't treat the
two issues in the same way.

Peter

Ignored ###error### in proplist-count?

2012-01-30 Thread Daniel Shahaf

Running with this patch:
[[[
% $svn diff -x-p
Index: subversion/mod_dav_svn/liveprops.c
===
--- subversion/mod_dav_svn/liveprops.c  (revision 1237720)
+++ subversion/mod_dav_svn/liveprops.c  (working copy)
@@ -721,6 +721,7 @@ insert_prop_internal(const dav_resource *resource,
 serr = svn_fs_node_proplist(proplist,
 resource-info-root.root,
 resource-info-repos_path, scratch_pool);
+value = ###error###; break;
 if (serr != NULL)
   {
 ap_log_rerror(APLOG_MARK, APLOG_ERR, serr-apr_err, 
]]]
and running 'svn proplist -v $URL/to/repos/root', doesn't cause any
error (with either DAV layer).

The code is definitely hit --- I checked this by sticking an abort()
there.

Haven't looked further into this -- perhaps someone else can?

Re: [RFC] Server Dictated Configuration

2012-01-30 Thread Julian Foad

Branko Čibej wrote:

 On 27.01.2012 12:53, Julian Foad wrote:
  Branko Čibej wrote:
  On 27.01.2012 11:50, Julian Foad wrote:
   We need to see how we'd implement a reasonable system of svn:ignores 
  and auto-props using the proposed inheritable properties system.  The 
 ability to see the inherited value and then merge in a child-defined 
  value [...] is essential if we're going to implement these features 
 using properties with semantics like the existing 'svn:ignores'.  [...]
 
  No, you need to give the inherited properties that carry server-dictated
  configuration a different name, don't you think? Whether the merging is
  then done server-side or client-side is almost a bikeshed.

  I'm not quite sure what you mean.  Could you give a specific example?
 
 [...] One way to achieve server-dictated configuration of ignores would 
 be to make the server control the 'global-ignores' [config setting].  
 But for the purposes of exploring inheritable properties, let's ignore the 
 'global-ignores' config setting and assume that we're going to 
 control the ignores through (inherited) properties alone.  [...]
 
 Heh, but I fail to see a semantic difference between the two cases. :)

An 
inherited properties design implies client-side setting of the 
inherited properties, whereas the design for server-dictated 
configuration implies that setting will be done server-side by an 
administrator.  For either approach, I would ask: how would you go about 
setting up a useful 
hierarchy of ignore patterns?  In the server-side case, you can say we'll just 
start with a simple config file format
 and defer that problem; somebody can design a more powerful config system for 
the administrator to use, later.  So I 
asked specifically about how one would conveniently define 
ignore-patterns hierarchically in a generally useful inherited 
properties design.

 Since the server-dictated global-ignores would only apply to a certain
 subtree in the repository, it would /already/ behave as if it were an
 inherited svn:ignore property, and what's more, would be implicitly
 merged by existing client implementation with any svn:ignore properties
 that subtree happens to contain.

No.  The way I read the proposed 'server-dictated config' scheme, it didn't 
include a way to configure different values for 'global-ignores' to apply to 
different 
directories inside the WC, only for transmitting a single value of 
'global-ignores' which could depend on the root directory of the WC.

But anyway, my point was to explore how useful the inherited properties idea 
would be in general, using ignore patterns as an example.  If you're suggesting 
that this example of an inherited 'global-ignores' value being augmented by a 
non-inheritable 'svn:ignore' value should serve as a general model for how 
overriding should be done in an inherited properties system, that's a valid 
suggestion but it doesn't look like an elegant one.

- Julian

AW: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Markus Schaber

Hi, Peter,

Von: Peter Samuelson [mailto:pe...@p12n.org] 
 [Stefan Sperling]
  We could also open the parent directory, read all the filenames 
  within it, normalise them all, and then search the resulting list. 
  This works, expect if a name exists twice, once in NFC form and once 
  in NFD form. We'd somehow have to solve the name collision in the 
  filesystem.

[Markus Schaber]
 This sounds astonishingly similar to the lower/upper case problem of 
 UN*X vs. Mac/Win.

 There are similarities, but there are some important differences:

- We have to support Mac OS X, which stores all files in NFD.  In the
  upper/lowercase analogy, think of OS X as MS-DOS, which does not
  preserve mixed case at all but always represents files in uppercase.
  Subversion doesn't support MS-DOS and I hope we never need to.  MS
  Windows, OTOH, at least preserves the upper/lowercase distinction
  presented to it when you create a file.  Big difference.

The preservation of cases does not help that much - a simple map all to lower 
case when accessing the working copy, and search case insensitive in the 
database could solve that problem - but there's the problem that the 
repository can contain files whose filename differs only in case, and then the 
preserving of original case does not help that much either.

- Also, the Subversion platform has chosen to support files like README
  and Readme that conflict on Windows.  Our reasoning is if you have
  users on Windows, don't do that.  Most solutions to the NFC/NFD
  problem will affect all platforms, not just one, and we probably
  can't just say well, don't do that - we'll need to actually prevent
  it - and somehow deal with existing clients, WCs, and repositories).

 Because of those differences, my gut feeling is that we can't treat the two 
 issues in the same way.

There seem to be clients which allow files whose name differs only by encoding. 
So the position of unicode encoding collisions could be the same than on 
case insensitivity collisions  (allow in the repository what the most capable 
clients allow). My guess is that the fixes for that scenario are rather similar 
(mainly client-based, specific to the capabilities of the platform, and if you 
have users on mac, don't do that). Of course Mac clients internally need to 
map to their normalized encoding in a similar way as it is done for case 
sensitivity now, and in case of encoding collisions, they've lost (similar to 
case collisions on Mac and Windows).

If the position is to disallow files whose name only differs by encoding in the 
repositories, things are a little bit different.

But I think that even this can be solved purely on the client, by only sending 
normalized names to the server for all new objects (imports, additions, copy 
targets, ...), and using the existing encodings for all existing objects.

For existing collisions, which harm work on MacOS, the usual workarounds apply: 
Rename the colliding files via repo-browser or in a more capable client. 
Additionally, we could develop a dump filter tool for name normalization, maybe 
with a switch whether to error out or silently rename on collisions.

With proper documentation, this will cause the problem to fade out in the 
future, and - in theory - it can be implemented on top of the first one at a 
later time. I don't see any need to change anything on the server (both 
implicit conversion and rejection of invalid encodings would break existing 
clients and working copies). My personal guess is that actual encoding 
collisions are rather rare, and workarounds exist, so servers can start to 
reject invalid encodings with version 2.0, or whatever future version is 
allowed to break compatibility to old clients.


Best regards

Markus Schaber
-- 
___
We software Automation.

3S-Smart Software Solutions GmbH
Markus Schaber | Developer
Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax 
+49-831-54031-50

Email: m.scha...@3s-software.com | Web: http://www.3s-software.com 
CoDeSys internet forum: http://forum.3s-software.com
Download CoDeSys sample projects: 
http://www.3s-software.com/index.shtml?sample_projects

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Julian Foad

Let me just note some of the main similarities and differences between this 
issue of Unicode compositions and the issue of case-sensitivity in file names.
Differences:

  * NFC and NFD look the same when 
displayed, and most users haven't heard of them and don't expect that a 
computer might treat two 
identical-looking filenames as different.  With letter case, most users are 
aware that some systems treat upper and lower case letters as the same while 
other systems treat them as different, and they learn to behave according to 
the system's rules.


  *The main 
case-insensitive file systems are case-preserving with no normal form, 
whereas the main system that treats NFC and NFD as equivalent(MacOS) chooses 
one form as the normal form and always normalizes the given file name to that 
form.


Similarities:
  * If two Unicode strings differ only by letter case, on some computer systems 
they refer to the same file, while on other systems they refer to different 
files.  The rules are created by the 
designers of the systems, sometimes explicitly and sometimes 
implicitly.  Different parts of a system can have different rules.  The 
same applies if two Unicode strings differ only by composition. 

  * Subversion  interoperates with different systems.  When two file names that 
differ only by letter case are transferred from a 
case-sensitive system to a case-insensitive system, they will collide 
and Subversion shouldhandle thisin some friendly way.  The same applies if two 
file namesdiffer only by composition.

The differences are important, but the similarities are enough that we should 
be looking for some commonality in the implementation.

- Julian

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Neels J Hofmeyr

On 01/30/2012 02:00 PM, Markus Schaber wrote:
 Maybe the best solution to this issue is a client-only solution, in a similar 
 way the case sensitivity problem is tackled.

Spinning the client-only thought a bit: Imagine a repos with a un*x user
adding a file called föö. Now an OSX user checks it out and gets the path
normalized to fo:o:.

1. wc.db on OSX's HFS+ file systems has to be aware that the file föö is
stored locally as fo:o:.

2. Whenever the OSX user types in fo:o:, the client must remember that the
repos expects the path for this node to be sent as föö, or the repos will
reply that the node does not exist. It could be solved with a translation
table between the repos and the client, but it remains quite a messy
endeavor, because:

3. New files may be added remotely at any given moment. For example, a path
'föö/bar' is checked out to OSX's fs and becomes 'fo:o:/bar'. Then someone
else adds 'fo:o:/bar' to the repos as well -- we now have two distinct 'bar'
files in the repos that share the same normalized path. Now OSX potentially
mistakes its checked-out 'föö/bar' for the later added 'fo:o:/bar', as that
matches the local path without any de-normalisations... The OSX client
basically has no chance to show conflicting files to its user
simultaneously. Data is hidden.


Thus, OSX admins will want the repository to be able to disallow having
multiple representations of the same normalized path -- not that easy to
achieve, in fact: before accepting a path name from the client, the repos
needs to either cycle through all possible unicode representations or needs
to normalize and compare all existing paths. Normalizing a client's path
before storing in the repos is a no-go, as the client won't be able find its
nodes later. Probably the best option is to define a given normalization per
repos and then refuse commits that add non-normalized paths, like a
pre-commit hook.

On the other hand, an all-un*x shop must be allowed to operate the way they
always did. Their OSs only see byte sequences and don't mess around with
normalization. Say they want to have a folder of differently normalized
representations of the same file for testing *their own* code for unicode
robustness. They should be able to do that. (They obviously can't use OSX's
HFS+ for that, though.)

So, on top of client-only fixes, it would be good to have ways to enforce
certain repository behavior, based on self-imposed policy -- I mean, we
won't have The Subversion Normalization, each admin decides alone.

On 01/30/2012 01:30 PM, Stefan Sperling wrote:
 I am not convinced that it is impossible to fix.

Nicely put :)

~Neels

[[[
fred@mac $ svn co http://svn/repos
A foo
A bar
*** Warning:
You are checking out to an HFS+ file system. Your WC may not accurately
represent this revision. Consider using a different file system!
Continue? (Y/n) Y
A föö
*** File name collision detected. Skipping 'föo:'
*** File name collision detected. Skipping 'fo:ö'
*** File name collision detected. Skipping 'fo:o:'
A baz
fred@mac $
]]]
:P



signature.asc
Description: OpenPGP digital signature

Re: request to clarify and improve Subversion property name specification

2012-01-30 Thread Daniel Shahaf

Garret Wilson wrote on Mon, Jan 30, 2012 at 06:31:19 -0800:
 On 1/29/2012 11:26 PM, Daniel Shahaf wrote:
 ...
 
 - Publish your properties migration code for others to reuse.
 
 Done:
 
 https://...

Thanks.

 [1] If you answer In a specification I'll ask how it would relate to
 the existing API docs.
 
 The mythical Subversion Specification is a completely different
 animal than an API specification. After all, there can be (and are)
 different APIs (e.g. DAV+SVN, JavaHL, SVNKit). The APIs should all
 follow the Subversion Specification, which is agnostic to any API.
 One of the biggest disconnects in the Subversion community seems to
 be this idea that some source code comments of an API substitutes
 for a specification of Subversion itself---the framework.

The C API is not at equal standing with the wire protocol or SVNKit.

Yes, property names appear everywhere (all public API layers, in the
internals of every library), but the documentation of what is a valid
property name (beyond the type-safety) appears only in one place.  That
could be improved.

Re: [RFC] Server Dictated Configuration

2012-01-30 Thread Branko Čibej

On 30.01.2012 17:05, Julian Foad wrote:

 No.  The way I read the proposed 'server-dictated config' scheme, it didn't 
 include a way to configure different values for 'global-ignores' to apply to 
 different 
 directories inside the WC, only for transmitting a single value of 
 'global-ignores' which could depend on the root directory of the WC.

Huh? How does that make sense in, e.g., the ASF repository?

-- Brane

Re: [RFC] Server Dictated Configuration

2012-01-30 Thread Paul Burba

On Mon, Jan 30, 2012 at 11:05 AM, Julian Foad
julianf...@btopenworld.com wrote:
 Branko Čibej wrote:

 On 27.01.2012 12:53, Julian Foad wrote:
  Branko Čibej wrote:
  On 27.01.2012 11:50, Julian Foad wrote:
   We need to see how we'd implement a reasonable system of svn:ignores
  and auto-props using the proposed inheritable properties system.  The
 ability to see the inherited value and then merge in a child-defined
  value [...] is essential if we're going to implement these features
 using properties with semantics like the existing 'svn:ignores'.  [...]

  No, you need to give the inherited properties that carry server-dictated
  configuration a different name, don't you think? Whether the merging is
  then done server-side or client-side is almost a bikeshed.

  I'm not quite sure what you mean.  Could you give a specific example?

 [...] One way to achieve server-dictated configuration of ignores would
 be to make the server control the 'global-ignores' [config setting].
 But for the purposes of exploring inheritable properties, let's ignore the
 'global-ignores' config setting and assume that we're going to
 control the ignores through (inherited) properties alone.  [...]

 Heh, but I fail to see a semantic difference between the two cases. :)

 An
 inherited properties design implies client-side setting of the
 inherited properties, whereas the design for server-dictated
 configuration implies that setting will be done server-side by an
 administrator.  For either approach, I would ask: how would you go about 
 setting up a useful
 hierarchy of ignore patterns?  In the server-side case, you can say we'll 
 just start with a simple config file format
  and defer that problem; somebody can design a more powerful config system 
 for the administrator to use, later.  So I
 asked specifically about how one would conveniently define
 ignore-patterns hierarchically in a generally useful inherited
 properties design.

 Since the server-dictated global-ignores would only apply to a certain
 subtree in the repository, it would /already/ behave as if it were an
 inherited svn:ignore property, and what's more, would be implicitly
 merged by existing client implementation with any svn:ignore properties
 that subtree happens to contain.

 No.  The way I read the proposed 'server-dictated config' scheme, it didn't 
 include a way to configure different values for 'global-ignores' to apply to 
 different
 directories inside the WC, only for transmitting a single value of
 'global-ignores' which could depend on the root directory of the WC.

That is incorrect, the server dictated configuration proposal
(http://wiki.apache.org/subversion/ServerDictatedConfiguration)
supports different configuration values by path:

[[[
Behavioral specification

The high-level behavior for server-dictated configuration is
relatively simple: the repository maintains a list of configuration
parameters and values which, as necessary, the server provides to the
client. The client, then, behaves in accordance with the
server-dictated configuration.

Subversion could recognize multiple levels of possible hierarchy in
the server-side configuration: server-wide, per repository, or per
repository-path. The current plan is to allow configuration at the
most granular level, per repository-path.
]]]

Paul

 But anyway, my point was to explore how useful the inherited properties idea 
 would be in general, using ignore patterns as an example.  If you're 
 suggesting that this example of an inherited 'global-ignores' value being 
 augmented by a non-inheritable 'svn:ignore' value should serve as a general 
 model for how overriding should be done in an inherited properties system, 
 that's a valid suggestion but it doesn't look like an elegant one.

 - Julian

Re: [PATCH] Fix a bug with property validation logic during 'svnadmin load'

2012-01-30 Thread C. Michael Pilato

On 01/27/2012 02:10 AM, vijay wrote:
 Fix the helper function 'change_rev_prop' to use functions which perform
 validation of the property value if 'validate_props' is set. Otherwise,
 bypass those checks.
 
 * subversion/libsvn_repos/load-fs-vtable.c
   (change_rev_prop): Do the property validation indeed if 'validate_props' is
 set.
 
 Patch by: Vijayaguru G vijay{_AT_}collab.net

Committed:

   Sendingsubversion/libsvn_repos/load-fs-vtable.c
   Transmitting file data .
   Committed revision 1237779.

Thanks, Vijay.

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature

Re: [RFC] Server Dictated Configuration

2012-01-30 Thread Julian Foad

Paul Burba wrote:

 Julian Foad wrote:
 [...] The way I read the proposed 'server-dictated config' scheme, 
 it didn't include a way to configure different values for 
 'global-ignores' to apply to different  directories inside the WC,
 [...]
 
 That is incorrect, the server dictated configuration proposal
 (http://wiki.apache.org/subversion/ServerDictatedConfiguration)
 supports different configuration values by path:

Oh, I'm sorry, I misremembered that.

- Julian

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling

On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote:
 2012/1/30 Stefan Sperling s...@elego.de:
  My friend is not willing to upgrade to a new client version yet, which
  is fine because all 1.x releases of Subversion clients are supposed
  to be compatible with all 1.y releases of Subversion servers. He should
  not have to upgrade his client just because the server was upgraded.
 
  In his working copy, the file name is also in NFD form. When he
  talks to the server, the server provides the name in NFC. Because he
  is using the old client the client has no way of knowing how to map
  the NFC name to its local NFD file. So we've broken backwards
  compatibility for my friend.
 
 I think we cannot avoid this. So this patch is for 2.x, which may
 break backward compatibility.

If we are ever going to break compatibility, this issue will
certainly be addressed by normalising all paths as you suggest.
It was an unfortunate oversight that no NFD/NFC normalisation
was implemented in the first place :(

However, we really do not want to break compatibility at this time.
A solution that does not require us to break compatibility would
be much better. Nobody knows yet when the time for 2.x will come.

As far as I know, HFS+ is the only filesystem that has this problem.
It is possible to use other filesystems on Mac OS X as a workaround.
For example UFS, ext2, or NTFS (via FUSE).

I think Subversion's backwards compatibility is very important and
should not be jeopardised because of the behaviour of one filesystem.
 
 If we have two files of the same filenames, one in NFC, the other in NFD,
 it is really a headache for us to normalize all paths to NFC. The only thing
 we can do is just keep one file of the two and throw the other file.
 
 In reality, I think this is rare case. If we find this collision when 
 upgrading
 repositories, we should stop and provide the way for users to choose which
 one to save.

I agree that this is probably a rare case in practice. However, we must
be prepared to handle it. Users who run into this problem can lose the
ability to use newer versions of Subversion to read their data.
This cannot be allowed to happen because we want to stay compatible.

  As you can see, there is a lot of complexity involved in fixing this
  issue. I hope you aren't discouraged by this. Someone will need to
  explore the details of these problems to fix this issue. I am not convinced
  that it is impossible to fix. We'll need to be very careful about backwards
  compatibility when making decisions. But there might be ways to achieve a
  satisfying solution nonetheless.
 
 Like other people say, we should prohibit the NFC/NFD same filename collision,
 not in the subversion system, but in operational rules, just don't do that.

So far, don't do that has been the answer to this entire problem.
We've been telling people if they want to use non-ASCII characters
with both Windows/Linux and Mac OS X clients they should not be using HFS+.

And mixing various unicode forms works fine today if the filesystem
used by the client supports this. The use case Neels contrived, where
developers want to test their code with unicode filenames in various
NFD/NFC forms, and check those test files into Subversion, should still
be supported.

 Then, the rest problem seems rather simple. Convert *all* input paths to NFC
 first, then do the work. All input means paths passed to servers from clients,
 paths obtained by servers from repositories, paths obtained by clients from
 working copies. Is that correct?

Yes, that is correct. Also, paths obtained by clients from the local
filesystem, and paths sent by servers to clients.

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Johan Corveleyn

On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling s...@elego.de wrote:
 On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote:
 2012/1/30 Stefan Sperling s...@elego.de:

[ ... ]

 And mixing various unicode forms works fine today if the filesystem
 used by the client supports this. The use case Neels contrived, where
 developers want to test their code with unicode filenames in various
 NFD/NFC forms, and check those test files into Subversion, should still
 be supported.

Indeed.

Though this means that unconditional NFC (or whatever) normalization
in the working copy database is not an option, since it precludes
representing multiple forms at the same time in the wc. Except maybe
dependent on the (filesystem of the) client platform.

Of course, if a repository needs to support also checkouts to OSX/HFS+
clients, it should be configured to disallow multiple (conflicting)
forms to enter the repository. This can be done with a pre-commit
hook, similar to case-insensitive.py [1], which does the same for
case-clashing files.

(BTW, case-insensitive.py works by comparing incoming adds with the
list of directory entries of the corresponding directory within the
txn (comparing their normalized forms))

-- 
Johan

[1] 
http://svn.apache.org/repos/asf/subversion/trunk/contrib/hook-scripts/case-insensitive.py

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej

On 30.01.2012 21:00, Johan Corveleyn wrote:
 On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling s...@elego.de wrote:
 On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote:
 2012/1/30 Stefan Sperling s...@elego.de:
 [ ... ]

 And mixing various unicode forms works fine today if the filesystem
 used by the client supports this. The use case Neels contrived, where
 developers want to test their code with unicode filenames in various
 NFD/NFC forms, and check those test files into Subversion, should still
 be supported.
 Indeed.

 Though this means that unconditional NFC (or whatever) normalization
 in the working copy database is not an option, since it precludes
 representing multiple forms at the same time in the wc. Except maybe
 dependent on the (filesystem of the) client platform.

Are you seriously proposing that we /support/ such broken, hackish
nonsense? How do you expect users to tell the difference between file
names that look identical on the character level, but are not on the
code point level?

Supporting such hacks would only be a source of bug reports. I don't see
this as a desirable feature.

And as for doing the server-side checks in pre-commit hooks ... i guess
you could write a whole libsvn_repos implementation merely as a set of
pre-commit hooks, but who would want to? Hooks aren't intended for
implementing core functionality..

-- Brane

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Johan Corveleyn

On Mon, Jan 30, 2012 at 9:09 PM, Branko Čibej br...@xbc.nu wrote:
 On 30.01.2012 21:00, Johan Corveleyn wrote:
 On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling s...@elego.de wrote:
 On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote:
 2012/1/30 Stefan Sperling s...@elego.de:
 [ ... ]

 And mixing various unicode forms works fine today if the filesystem
 used by the client supports this. The use case Neels contrived, where
 developers want to test their code with unicode filenames in various
 NFD/NFC forms, and check those test files into Subversion, should still
 be supported.
 Indeed.

 Though this means that unconditional NFC (or whatever) normalization
 in the working copy database is not an option, since it precludes
 representing multiple forms at the same time in the wc. Except maybe
 dependent on the (filesystem of the) client platform.

 Are you seriously proposing that we /support/ such broken, hackish
 nonsense? How do you expect users to tell the difference between file
 names that look identical on the character level, but are not on the
 code point level?

Huh? I'm not proposing anything. Hiroaki suggested (with his patch and
followup discussion) to do normalization to NFC in wc.db (or something
like that, so that all paths that enter wc.db are in NFC form). All
I'm saying is that this conflicts with the use case
Neels contrived, to represent multiple forms in the working copy.
Except if you allow some clients to do it, and others not (either by a
client-side option, or by platform-specific behavior).

 Supporting such hacks would only be a source of bug reports. I don't see
 this as a desirable feature.

No problem, I don't either. I'm not really participating in this
discussion (got enough discussions going on already :-)). Just wanted
to point out the conflict ...

 And as for doing the server-side checks in pre-commit hooks ... i guess
 you could write a whole libsvn_repos implementation merely as a set of
 pre-commit hooks, but who would want to? Hooks aren't intended for
 implementing core functionality..

Ok, then I also propose that case-insensitive.py should be folded into
core functionality (server-side option). That would be vastly better
of course, more performant etc ...

So I totally agree.

-- 
Johan

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling

On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote:
 Are you seriously proposing that we /support/ such broken, hackish
 nonsense? How do you expect users to tell the difference between file
 names that look identical on the character level, but are not on the
 code point level?

 Supporting such hacks would only be a source of bug reports. I don't see
 this as a desirable feature.

The question is why you would want to break it now that it works.
Because of HFS+? Isn't what HFS+ does just as broken if you think
about it? Why normalise paths in the filesystem if nobody else does it?

I'd prefer a universe where svn normalises anything to NFC from the
1.0 release onwards. Alas, we're in the wrong one.
Compare http://www.qwantz.com/index.php?comic=34 and following.
. o O (Where's my goatee?)

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej

On 30.01.2012 21:29, Stefan Sperling wrote:
 On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote:
 Are you seriously proposing that we /support/ such broken, hackish
 nonsense? How do you expect users to tell the difference between file
 names that look identical on the character level, but are not on the
 code point level?

 Supporting such hacks would only be a source of bug reports. I don't see
 this as a desirable feature.
 The question is why you would want to break it now that it works.
 Because of HFS+? Isn't what HFS+ does just as broken if you think
 about it? Why normalise paths in the filesystem if nobody else does it?

You're aware that MacPorts subversion already has a hack to normalize
the other way, at least over the wire. :)

Sure, if you want to turn on such normalization, you pretty much have to
dump and reload the repository as well as upgrading all working copies
(again). Either that, or use form-independent comparison on the server,
which isn't such a bad idea anyway. Doing that in wc.db is probably harder.

-- Brane

Re: Implicit keep-alive after reintegrate merge

2012-01-30 Thread Johan Corveleyn

On Mon, Jan 30, 2012 at 2:23 PM, Stefan Sperling s...@elego.de wrote:
 On Tue, Jan 24, 2012 at 01:12:39AM +0100, Branko Čibej wrote:
 By the way, I read Stefan's description of why --reintegrate is
 necessary, and after slogging through the unfortunate terminology (2-URL
 merge doesn't mean a thing in CM theory :) and one little bit caught my
 attention:

  A sync merge can fill in the all parameters as well, except PATH2.
  However, it needs to do so in a different way. With a sync merge
  PATH1 and PATH2 are the same

 I keep reading this in the context of the rest of the reasoning, any my
 reaction is still: WTF? Bogus! This looks like someone /started off/
 with the assumption that a sync merge can take shortcuts where a
 reintegrate merge cannot; but, so sorry, that's just plain nonsense.

[ ... ]

 Generally, we want to avoid spurious conflicts from diff3 which happen
 when changes are applied twice because diff3 is not idempotent.
 I.e. we break the nice symmetry to work around a limitation of diff3.

 In the following case we can avoid spurious conflicts by picking
 our parameters carefully:

     (3)
                +-b@r2--+ b@r3--b@r4-b@r5 
      (branch) /        ^             | (merge 2)
              /         | (merge 1)   v
        --- a@r1 --a@r2---+- a@r6 

 Merge 1 brings a@r2 into b@r2.
 Merge 2 brings b@r4 into a@r5.

  (3.1) svn co b@r2 wc; svn merge a@r1 a@r2 b

 There are two ways of performing merge 2.
 The first is symmetrical and re-applies a@r2 to a@r6, via b@r3,
 with possible spurious conflicts from diff3:

  (3.2 a) svn co a@r5 wc; svn merge b@r2 b@r5 a

 The second does not re-apply a@r2, so there are no possible conflicts
 from diff3 because of a@r2/b@r3. Only b@r4 can conflict.

  (3.2 b) svn co a@r5 wc; svn merge b@r3 b@r5 a

 The result is the same, however.

 What we use during --reintegrate is (3.2 b).
 You can argue that this approach is broken and we should be using (3.2 a)
 for symmetry, and let users deal with spurious conflicts.

No, AFAIU, Brane's suggestion was not that we shouldn't use the
reintegrate-way for 3.2, but rather that we should *always* use the
reintegrate-way, also for sync merges. So that a sync merge picks
its arguments for the 2-URL merge in the same way as a reintegrate
merge. Unless I misunderstood what Brane meant.

And I think he's got a point. I don't have the time to write up a
detailed example right now, but I think it should work.

If that would be the case, then we effectively implement the merges
completely symmetrical: always the reintegrate-way.

-- 
Johan

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling

On Mon, Jan 30, 2012 at 09:34:03PM +0100, Branko Čibej wrote:
 Sure, if you want to turn on such normalization, you pretty much have to
 dump and reload the repository as well as upgrading all working copies
 (again). Either that, or use form-independent comparison on the server,
 which isn't such a bad idea anyway. Doing that in wc.db is probably harder.

It is indeed harder because we are passing paths verbatim to sqlite.
I doubt having more than one form of a given path in wc.db is fun...

Re: Implicit keep-alive after reintegrate merge

2012-01-30 Thread Stefan Sperling

On Mon, Jan 30, 2012 at 10:37:51PM +0100, Johan Corveleyn wrote:
 On Mon, Jan 30, 2012 at 10:19 PM, Stefan Sperling s...@elego.de wrote:
  But you cannot use the last-synced revision as left anchor either:
   svn co b
   svn merge b@r4 a@r6 b
  Because applying this delta reverts b@r5 (this change appears reversed
  in the diff between b@r4 and a@r6 since it isn't present on branch a).
 
 No, I don't think it does. The change b@r5 doesn't appear in this
 diff, neither forward nor reversed. Say b@r5 adds a line in file b/X,
 I see no reason this change (forward or reversed) would be part of the
 difference between b@r4 and a@r6.

Sorry, I made an error while transferring my experiment into an example.

The problem happens if a non-merge commit (b@r4) happens prior to
the first merge commit (b@r5), like this:

 +b@r2---b@r4---b@r5--b@r6---b@r8--
   (branch) /   ^ ^ (merge 2)
   /| (merge 1)   |
 --- a@r1---a@r3+---a@r7--+---

b@r4 appears reversed in 'svn diff b@5 a@7' -- not good
a@r3 appears in 'svn diff b@2 a@7' -- not good either, applied twice
But 'svn merge a@r5 a@r7' works fine.

Full transcript below.

+ rm -rf merge-test
+ mkdir -p merge-test
+ mkdir -p merge-test/a
+ echo alpha
+  merge-test/a/alpha 
+ echo beta
+  merge-test/a/beta 
+ svnadmin create /tmp/merge-test/repos
+ svn import merge-test/a file:tmp/merge-test/repos/a -m importing project 
tree
Adding merge-test/a/alpha
Adding merge-test/a/beta

Committed revision 1.
+ svn copy file:tmp/merge-test/repos/a file:tmp/merge-test/repos/b -m 
creating b

Committed revision 2.
+ rm -rf merge-test/a
+ svn checkout file:tmp/merge-test/repos/a merge-test/a
Amerge-test/a/alpha
Amerge-test/a/beta
Checked out revision 2.
+ svn checkout file:tmp/merge-test/repos/b merge-test/b
Amerge-test/b/alpha
Amerge-test/b/beta
Checked out revision 2.
+ echo foo
+  merge-test/a/alpha 
+ svn commit merge-test/a -mm
Sendingmerge-test/a/alpha
Transmitting file data .
Committed revision 3.
+ echo bar
+  merge-test/b/beta 
+ svn commit merge-test/b -mm
Sendingmerge-test/b/beta
Transmitting file data .
Committed revision 4.
+ svn up merge-test/b
Updating 'merge-test/b':
At revision 4.
+ svn merge file:tmp/merge-test/repos/b@2 file:tmp/merge-test/repos/a@3 
merge-test/b
--- Merging differences between repository URLs into 'merge-test/b':
Umerge-test/b/alpha
--- Recording mergeinfo for merge between repository URLs into 'merge-test/b':
 U   merge-test/b
+ svn commit merge-test/b -mm
Sendingmerge-test/b
Sendingmerge-test/b/alpha
Transmitting file data .
Committed revision 5.
+ echo bar2
+  merge-test/b/beta 
+ svn commit merge-test/b -mm
Sendingmerge-test/b/beta
Transmitting file data .
Committed revision 6.
+ echo foo2
+  merge-test/a/alpha 
+ svn commit merge-test/a -mm
Sendingmerge-test/a/alpha
Transmitting file data .
Committed revision 7.
+ svn diff file:tmp/merge-test/repos/b@2 file:tmp/merge-test/repos/a@7
Index: alpha
===
--- alpha   (.../b) (revision 2)
+++ alpha   (.../a) (revision 7)
@@ -1 +1,3 @@
 alpha
+foo
+foo2
+ svn up merge-test/b
Updating 'merge-test/b':
At revision 7.
+ svn merge --accept=postpone file:tmp/merge-test/repos/b@2 
file:tmp/merge-test/repos/a@7 merge-test/b
--- Merging differences between repository URLs into 'merge-test/b':
Cmerge-test/b/alpha
--- Recording mergeinfo for merge between repository URLs into 'merge-test/b':
 U   merge-test/b
Summary of conflicts:
  Text conflicts: 1
+ svn diff merge-test/b
Index: merge-test/b
===
--- merge-test/b(revision 7)
+++ merge-test/b(working copy)

Property changes on: merge-test/b
___
Modified: svn:mergeinfo
   Merged /a:r4-7
Index: merge-test/b/alpha
===
--- merge-test/b/alpha  (revision 7)
+++ merge-test/b/alpha  (working copy)
@@ -1,2 +1,7 @@
 alpha
+ .working
 foo
+===
+foo
+foo2
+ .merge-right.r7
+ svn revert -R merge-test/b
Reverted 'merge-test/b'
Reverted 'merge-test/b/alpha'
+ svn diff file:tmp/merge-test/repos/b@5 file:tmp/merge-test/repos/a@7
Index: alpha
===
--- alpha   (.../b) (revision 5)
+++ alpha   (.../a) (revision 7)
@@ -1,2 +1,3 @@
 alpha
 foo
+foo2
Index: beta
===
--- beta(.../b) (revision 5)
+++ beta(.../a) (revision 7)
@@ -1,2 +1 @@
 beta
-bar
Index: .
===
--- .   (.../b) (revision 5)
+++ .   (.../a) (revision 7)

Property changes on: .

Re: Implicit keep-alive after reintegrate merge

2012-01-30 Thread Johan Corveleyn

On Mon, Jan 30, 2012 at 11:16 PM, Stefan Sperling s...@elego.de wrote:
 On Mon, Jan 30, 2012 at 10:37:51PM +0100, Johan Corveleyn wrote:
 On Mon, Jan 30, 2012 at 10:19 PM, Stefan Sperling s...@elego.de wrote:
  But you cannot use the last-synced revision as left anchor either:
   svn co b
   svn merge b@r4 a@r6 b
  Because applying this delta reverts b@r5 (this change appears reversed
  in the diff between b@r4 and a@r6 since it isn't present on branch a).

 No, I don't think it does. The change b@r5 doesn't appear in this
 diff, neither forward nor reversed. Say b@r5 adds a line in file b/X,
 I see no reason this change (forward or reversed) would be part of the
 difference between b@r4 and a@r6.

 Sorry, I made an error while transferring my experiment into an example.

 The problem happens if a non-merge commit (b@r4) happens prior to
 the first merge commit (b@r5), like this:

                 +b@r2---b@r4---b@r5--b@r6---b@r8--
       (branch) /               ^             ^ (merge 2)
               /                | (merge 1)   |
         --- a@r1---a@r3+---a@r7--+---

 b@r4 appears reversed in 'svn diff b@5 a@7' -- not good
 a@r3 appears in 'svn diff b@2 a@7' -- not good either, applied twice
 But 'svn merge a@r5 a@r7' works fine.

Ah yes, I see. Thanks for clarifying.

However, I'm still not convinced :-). I'm starting to think more about
the symmetry with the standard sync-reintegrate cycle. In one of your
previous mails in this thread you described reintegrate like this:

On Mon, Jan 30, 2012 at 2:51 PM, Stefan Sperling s...@elego.de wrote:
 On Mon, Jan 30, 2012 at 02:23:46PM +0100, Stefan Sperling wrote:
 What we use during --reintegrate is (3.2 b).

 And here I'm catching myself spreading misinformation again.

 There is another tweak we use during reintegrate.
 Consider the graph again (fixed version):

  (3)
 +-b@r2--+ b@r4--b@r5-b@r6 
   (branch) /^ | (merge 2)
   / | (merge 1)   v
 --- a@r1 --a@r3---+- a@r7 

 If we used:

  (3.2 b) svn co a@r6 wc; svn merge b@r4 b@r7 a

 we'd miss b@r2 during the merge. It won't be applied to branch a.
 But we want it in the changeset.

 So what really happens is:

  (3.2 b) svn co a@r6 wc; svn merge a@r4 b@r6 a

 Note that a is compared to b, rather than b against its former self.
 This delta includes b@r2 because this change isn't yet on branch a.

So the left argument of the 2-URL merge is
target@last-rev-target-was-brought-in-sync (and the right is
source@HEAD). That makes sense.

If we translate this to our situation, i.e. the other way around, then
'svn merge b@2 a@7 b' would be the one. Because b@2 is the last time b
was still synced with a. But there is the problem of change a@r3 then
being applied twice. However, isn't this the same as the multiple
reintegrate problem, i.e. implicit keep-alive after reintegrate?

Your example is effectively reintegrating the same branch twice.
Which means the same problem applies. And maybe the solution is: we
should be able to skip the already reintegrated stuff, i.e. a@r3.
(I'm not sure anymore what the state-of-the-art is concerning the
implicit keep-alive stuff, but maybe it's that 'svn diff b@2 a@7'
needs to be adjusted by subtracting a@3 from it, because that's
already been applied)

So I'm guessing that if we can solve the implicit keepalive after
reintegrate, i.e. let reintegrate skip the already integrated stuff,
we would no longer need --reintegrate, because everything can now be
done with a reintegrate merge.

-- 
Johan

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Peter Samuelson


[Stefan Sperling]
 It is indeed harder because we are passing paths verbatim to sqlite.
 I doubt having more than one form of a given path in wc.db is fun...

That's the implementation I would like to see, to be honest.  Start
with the observation that we can treat Mac OS X NFD paths as a client
character encoding.  Now observe that it is lossy.  But ... almost all
non-Unicode client charsets are equally lossy, for exactly the same
reason!

This suggests maintaining a mapping table in wc.db between server paths
(UTF-8, unspecified NF) and wc paths (local charset, which is
occasionally UTF-8 with NFD).

This mapping table would be maintained any time we write to the wc.
It would be consulted any time we search for files in the wc.

It's not really extra work - we have to do those UTF-8 - local
charset conversions all the time anyway.  This would in fact cache
those conversions.

The implementation on OS X might be a bit hairy, if there isn't an easy
way to retrieve the real pathname of the file you just created.
Anywhere else, we just store the pathname we just calcuated.

Peter

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej

On 31.01.2012 00:14, Peter Samuelson wrote:
 [Stefan Sperling]
 It is indeed harder because we are passing paths verbatim to sqlite.
 I doubt having more than one form of a given path in wc.db is fun...
 That's the implementation I would like to see, to be honest.  Start
 with the observation that we can treat Mac OS X NFD paths as a client
 character encoding.  Now observe that it is lossy.  But ... almost all
 non-Unicode client charsets are equally lossy, for exactly the same
 reason!

 This suggests maintaining a mapping table in wc.db between server paths
 (UTF-8, unspecified NF) and wc paths (local charset, which is
 occasionally UTF-8 with NFD).

 This mapping table would be maintained any time we write to the wc.
 It would be consulted any time we search for files in the wc.

 It's not really extra work - we have to do those UTF-8 - local
 charset conversions all the time anyway.  This would in fact cache
 those conversions.

 The implementation on OS X might be a bit hairy, if there isn't an easy
 way to retrieve the real pathname of the file you just created.
 Anywhere else, we just store the pathname we just calcuated.


Afaik the OSX API normalizes everything to NFD automagically. So at
least on that platform there's no chance of having more than one form
for the same filename at the same time. Likewise on Windows, which
normalizes to NFC.

I don't see what you mean by lossy though. NFD and NFC can represent
exactly the same set of characters, it's just that the representations
of some of them are different. Thus, this does not preclude normalizing
the paths in wc.db, and that's even easily automated. If such a
conversion finds a name collision ... the user is in serious trouble
already. :)

It's more likely to find such a collision on Unix than either Mac OS or
Windows (both of which normalize on the FS API level). But this case is
probably so rare that I wouldn't worry about it.

-- Brane

Re: Implicit keep-alive after reintegrate merge

2012-01-30 Thread Branko Čibej

On 30.01.2012 22:19, Stefan Sperling wrote:
 On Mon, Jan 30, 2012 at 09:38:15PM +0100, Johan Corveleyn wrote:
 No, AFAIU, Brane's suggestion was not that we shouldn't use the
 reintegrate-way for 3.2, but rather that we should *always* use the
 reintegrate-way, also for sync merges. So that a sync merge picks
 its arguments for the 2-URL merge in the same way as a reintegrate
 merge. Unless I misunderstood what Brane meant.

 And I think he's got a point. I don't have the time to write up a
 detailed example right now, but I think it should work.

 If that would be the case, then we effectively implement the merges
 completely symmetrical: always the reintegrate-way.
 Counter-example:

  +b@r2---b@r4---b@r5--b@r7-
(branch) /^^ (merge 2)
/ | (merge 1)  |
  --- a@r1---a@r3-+-a@r6--+---

 This performs two sync merges from a to b.

 The first merge can be done the reintegrate way:

   svn co b
   svn merge b@r2 a@r3 b
  
 This merge applies the a@r3 change to b@r2, yielding b@r4. Fine.

 But how would you perform the second merge, which applies a@r6 to
 b@r5 yielding b@r7, using the reintegrate way, without undoing
 b@r5 (a non-merge commit)?

The second mege is exactly the same as if the branch were created from
a@r3, not a@r1. Right? In your example, this is even trivially true,
since there were no changes on the branch between r2 and r4. But given
the slightly more complex example:

 +b@r2--b@r4-+-b@r5---b@r6---+--b@r8-
   (branch) /^   ^ (merge 2) \
   / | (merge 1) |\ (merge 3)
 --- a@r1---a@r3-+-a@r7--+-+-a@r9 ---


the results should be, effectively:

  * merge 1:
diff3 b@r4 a@r1 a@r3 | patch b
  * merge 2:
diff3 b@r6 a@r3 a@r7 | patch b
  * merge 3:
diff3 a@r7 b@r2 b@r4 | patch a
diff3 a@r7 b@r5 b@r6 | patch a

Merge 3 is a cherry-pick merge, of course. But whatever you do, you
always pick your common ancestor so that it's the most recent merge
point from the revision you're merging, and the myfile is always the
most recent version on the branch you're merging to (or, effectively, in
the target WC).

You'll be cherry-picking as long as both branches are being actively
modified, but you always have to do the check. The merge algorithm is
symmetric.

Now I don't know how the above merges translate into svn merge syntax
and whether or not --reintegrate does it this way, but that's how you do
it manually with diff3. SVN's mergeinfo has all the data that are
required to automate this merging, it's just not being used correctly.

-- Brane

RE: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Bert Huijben

 -Original Message-
 From: Branko Čibej [mailto:br...@xbc.nu]
 Sent: maandag 30 januari 2012 16:11
 To: dev@subversion.apache.org
 Subject: Re: Let's discuss about unicode compositions for filenames!

 On 31.01.2012 00:14, Peter Samuelson wrote:
  [Stefan Sperling]
  It is indeed harder because we are passing paths verbatim to sqlite.
  I doubt having more than one form of a given path in wc.db is fun...
  That's the implementation I would like to see, to be honest.  Start
  with the observation that we can treat Mac OS X NFD paths as a client
  character encoding.  Now observe that it is lossy.  But ... almost all
  non-Unicode client charsets are equally lossy, for exactly the same
  reason!

  This suggests maintaining a mapping table in wc.db between server paths
  (UTF-8, unspecified NF) and wc paths (local charset, which is
  occasionally UTF-8 with NFD).

  This mapping table would be maintained any time we write to the wc.
  It would be consulted any time we search for files in the wc.

  It's not really extra work - we have to do those UTF-8 - local
  charset conversions all the time anyway.  This would in fact cache
  those conversions.

  The implementation on OS X might be a bit hairy, if there isn't an easy
  way to retrieve the real pathname of the file you just created.
  Anywhere else, we just store the pathname we just calcuated.

 Afaik the OSX API normalizes everything to NFD automagically. So at
 least on that platform there's no chance of having more than one form
 for the same filename at the same time. Likewise on Windows, which
 normalizes to NFC.

 I don't see what you mean by lossy though. NFD and NFC can represent
 exactly the same set of characters, it's just that the representations
 of some of them are different. Thus, this does not preclude normalizing
 the paths in wc.db, and that's even easily automated. If such a
 conversion finds a name collision ... the user is in serious trouble
 already. :)

 It's more likely to find such a collision on Unix than either Mac OS or
 Windows (both of which normalize on the FS API level). But this case is
 probably so rare that I wouldn't worry about it.

Last time we discussed this in depth (a few years ago), Windows didn't perform 
the normalization you describe here.
Was this added later? (Any documentation pointers?)

I think the keyboard/editor support performs some normalization so users are 
unlikely to create the sequences not-normalized, but our old documents say that 
it just stores whatever it gets passed.
(Probably for the same reason as Subversion does it: compatibility with the 
time where we didn't know about these problems)

Bert

 -- Brane

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej

On 31.01.2012 02:47, Bert Huijben wrote:
 Last time we discussed this in depth (a few years ago), Windows didn't 
 perform the normalization you describe here.
 Was this added later? (Any documentation pointers?)

Ouch, you're right ... Windows API doesn't normalize the paths.

-- Brane

[l10n] Translation status report for trunk r1238144

2012-01-30 Thread Subversion Translation Status

Translation status report for trunk@r1238144

  lang   trans untrans   fuzzy obs
--
de2082 190 308 270  UUooo
es2008 264 385 404  ++UUU~
fr2270   2   4   0  +~
it1863 409 562 225  +++U~~oo
ja2002 270 447 650  ++UUU~ooo
ko2144 128 171  70  ++U~~~
nb2061 211 326 378  +++UUU
pl2087 185 285 155  UUo
 pt_BR1830 442 578 217  +++~~~oo
sv1785 487 601 223  ++U~~~oo
 zh_CN2244  28  14   0  +~
 zh_TW1766 506 629 283  ++U~~~oo

43 matches

Mail list logo