Sorry to have left the discussion running so long without contributing to it myself. The reason I started about changing the repository / fs is because it is where we store the dataset that we'll need to support forever: working copies get destroyed and checked out over and over every hour, every day. Repositories get created once and only accumulate data.
> > That doesn't solve the historical revisions containing "bad" paths. My > > understanding of the problem was that we'd go into the past and > > rewrite the paths into a single, canonical form. > > > > Agreed: an out-of-band solution fixes thing historically too. > As pointed out on IRC, I think it's important to stop adding semantically the same paths to a repository. From the perspective of efficiency, it might be handy to have a normalized version stored somewhere for all paths living in the repository, but to prevent addition of differently encoded paths, such a thing isn't really required: the correct encoding can be calculated when the check happens. > Having backend enforce NFC can wait for 2.0 I suppose :) > True, but the value of that might be limited: if we required all communications to be NFC encoded, we need to take additional measures - as pointed out by Branko - to make things work on MacOS X: currently, we have MacOS X shops happily working with non-ascii characters in the paths, all NFD encoded. That would change. By the way, Julian Foad, Philip Martin, Bert Huijben and I talked through a possible solution to fix the client-side issue which becomes an option once we switch to wc-ng. The full impact of that change needs to be determined though and probably does not fit in the 1.7 timeline. If it seems it does, we'll bring it up. To recap, the change I'm proposing is that we check pathnames with NFC/D aware comparison routines upon add_file() / add_directory() inside libsvn_repos or libsvn_fs_* - of which I suspect it's easier to handle inside the latter. In my proposal, we don't specify a "repository normal" encoding. If performance degrades too much, we can enhance the filesystem with a normalized version which doesn't need to be recoded in order to do the comparison with the incoming path. Other than that, I don't think there's anything *required* to make us Unicode-aware on the server. It's also the change I'm proposing cmpilato to implement in libsvn_fs_base as a proof of concept. This proposal says nothing about the client side. The client side can be fixed independently from the server side, given that we can't switch to normalized paths in the protocol until 2.0: whatever paths a server sends, the client will need to use those to communicate back to the server. Bye, Erik.