[no subject]
ゆうちょのATMは9時からだったので、それまで待ってからお金おろして帰りますね --
Re: Let's discuss about unicode compositions for filenames!
2012/2/17 Vincent Lefevre vincent-...@vinc17.net: On 2012-01-30 21:29:41 +0100, Stefan Sperling wrote: On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote: Are you seriously proposing that we /support/ such broken, hackish nonsense? How do you expect users to tell the difference between file names that look identical on the character level, but are not on the code point level? Supporting such hacks would only be a source of bug reports. I don't see this as a desirable feature. The question is why you would want to break it now that it works. Because of HFS+? [...] I think you mean because of Mac OS X. Indeed, unless this has changed, with the Mac OS X Terminal, when a user types an accented character, it is in NFD at the command line level. So, even if the user uses a conventional file system that can store both NFC and NFD, the filename will be in NFD, which will annoy Linux users. Actually, whether filename is in NFC or NFD depends on the way of inputting filenames. If you type all characters, it is in NFC. If you use shell filename completion by hitting tab key, it is in NFD. I tried with Japanese filenames and confirmed this. So, it is HFS+ which returns the filenames in NFD. -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
2012/2/9 Markus Schaber m.scha...@3s-software.com: Hi, Von: Stefan Sperling [mailto:s...@elego.de] On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: [Upgrade options / backwards compatibility for proposed unicode normalization fix] - Need to re-checkout existing working copies of the repository? = Yes, but only if config is changed from the default. Maybe this could even be avoided if newer clients (or an utility script) can upgrade the working copy to the normalized format. Yes, if the working copy does not have filename collisions. However, for compatibility, we cannot let newer clients upgrade working copies automatically because existing working copies may have filename collisions. Best regards Markus Schaber -- ___ We software Automation. 3S-Smart Software Solutions GmbH Markus Schaber | Developer Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax +49-831-54031-50 Email: m.scha...@3s-software.com | Web: http://www.3s-software.com CoDeSys internet forum: http://forum.3s-software.com Download CoDeSys sample projects: http://www.3s-software.com/index.shtml?sample_projects Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915 -- 中村 弘輝 )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
Hi, 2012/2/9 Thomas Åkesson tho...@akesson.cc: Hi, I have been interested in this issue for a couple of years and I remember it was discussed briefly at Subconf in Germany a couple of years ago. Branching the thread here because I'd like to propose a different approach than Hiroaki. This proposition is not very different from the note unicode-composition-for-filenames or what Peter S, Neels and others suggested, perhaps just combining 2 changes slightly differently. This is based on my limited understanding of WC-NG, please correct me if I make incorrect assumptions. - Server will still accept both NFC and NFD, however, it will no longer accept collisions. Enforced by normalising to NFD before uniqueness checks during add operations (yes, might be more expensive). There will be no unified normalisation, but the subversion server will work like most filesystems; return what was given to it. For compatibility, we cannot ignore existing repositories and working copies which have filename collisions. So we cannot enforce subversion servers and clients to normalize filenames. We must let users to choose whether filenames are normalized or not per repository. -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
2012/2/11 Branko Čibej br...@apache.org: On 11.02.2012 13:05, Hiroaki Nakamura wrote: 2012/2/9 Markus Schaber m.scha...@3s-software.com: Von: Stefan Sperling [mailto:s...@elego.de] On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: [Upgrade options / backwards compatibility for proposed unicode normalization fix] - Need to re-checkout existing working copies of the repository? = Yes, but only if config is changed from the default. Maybe this could even be avoided if newer clients (or an utility script) can upgrade the working copy to the normalized format. Yes, if the working copy does not have filename collisions. However, for compatibility, we cannot let newer clients upgrade working copies automatically because existing working copies may have filename collisions. That's not entirely true, since we can detect the collisions in advance, and a partially upgraded working copy would still work From a practical point of view, it's very, very unlikely that there would be any such collisions in a valid working copy. People would tend to notice. :) Yes, I agree wholeheartedly! At work, I notice there are a few repositories which have NFC filenames and NFD filenames. However there is no repository which have collisions as far as I know. -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
Hi, thanks for your review. 2012/2/9 Stefan Sperling s...@elego.de: Open questions: Here I try to answer these. Of course, I welcome everyone to answer. - How can the client retrieve the configuration from the server? This is related to server-dictated configuration, see http://wiki.apache.org/subversion/ServerDictatedConfiguration and http://subversion.tigris.org/issues/show_bug.cgi?id=1974 This issue would need to be solved first. I read those two pages and I think it can be done with server-dictated configuration. - What happens if NFC/NFD is enabled in repository config, but the repository contains non-normalised paths (i.e. did not go through a dump/load cycle to normalise all paths)? I think we will provide the check command for finding out: - whether a repository contains the same filenames of different unicode normalized/unnormalized forms. - all filenames in a repository are NFC. - all filenames in a repository are NFD. I think of an idea that we can change this config during loading cycle only, that is, we can specify this config as an option to load command. When load command finishes, the option value is saved in config. However, administrators can cheat to change config file without loading, as the config file is a plain text file. So we cannot enforce this config must be set only by load command. Therefore I think It should be administrators' responsibility to ensure this config match a repository. - How do we handle name collisions if both NFC and NFD forms exist in a repository that sets the configuration to NCF or NFD? Is an upgrade not supported in this case? No, I think we don't support to change this config to NFC/NFD in this case. Only unicode-normalization 'none' is allowed. Or will duplicate paths need to be discarded from history? How can the user filter the paths, and how can the user decide which path is kept? I think we don't support these. Maybe repository admin users can remove one of duplicated filenames from history in repository and try to load again, I wonder? Or will duplicate paths be renamed throughout history? How can the user rename the paths? I think users can only normalize filenames during load command. Users cannot rename filenames arbitrarily. Anything else? I cannot think of more questions but there might be more things to consider here. -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
2012/2/7 Branko Čibej br...@apache.org: On 06.02.2012 22:26, Hiroaki Nakamura wrote: The Unicode Standard says canonical equivalent sequences should be interpreted the same way. * 1.1 Canonical and Compatibility Equivalence http://unicode.org/reports/tr15/#Canonical_Equivalence * 2.12 Equivalent Sequences and Normalization http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf So we should not have the same name multiple times in repositories and working copies. Therefore subversion servers and clients does not need to handle them. *sigh* I don't give a gnat's whisker what the Unicode Standard says. I'm only interested in real-world situations. Or are you implying that, e.g., the Unix VFS layer will magically detect file name equality of different (de)normalized forms? Because it won't. -- Brane I'm interested in real-world situations, too. It is the reality that we need to avoid the same filenames in different forms because they confuse users so much. I don't think we expect file systems detect filename equality of different forms. Mac OS X HFS+ can have only NFD filenames and we must cope with it. And as you say, standard file systems in Linux and Windows does not magically detect file name equality of different forms. Also It's the reality we cannot force users to format their harddisks and change file systems. So communication layer must take care of this problem to provide interoperability among Windows, Linux and Mac. Subversion to the rescue! -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
Hi, all. It seems there is no further discussion. I think the conclusion for the short term solution is: We convert unnormalized paths to NFC normalized paths on clients only, that is, svn_path_cstring_to_utf8. It is the same approach as utf8precompose_macosx_2.patch in http://subversion.tigris.org/issues/show_bug.cgi?id=2464 It is proven to work as it is included in MacPorts unicode_path variant and Homebrew --unicode-path option. The difference is this time we use utf8proc instead of Mac OS X APIs, and we do conversions on not only Mac but all platforms. Do you agree? If so, I will update my patch and post it to http://subversion.tigris.org/issues/show_bug.cgi?id=2464 Best regards, -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
2012/2/6 Stefan Sperling s...@elego.de: On Mon, Feb 06, 2012 at 02:28:40PM +0100, Branko Čibej wrote: On 06.02.2012 14:10, Hiroaki Nakamura wrote: Hi, all. It seems there is no further discussion. I think the conclusion for the short term solution is: We convert unnormalized paths to NFC normalized paths on clients only, that is, svn_path_cstring_to_utf8. It is the same approach as utf8precompose_macosx_2.patch in http://subversion.tigris.org/issues/show_bug.cgi?id=2464 It is proven to work as it is included in MacPorts unicode_path variant and Homebrew --unicode-path option. You'll note that MacPorts also warns you that using this option may cause interoperability issues with other clients that aren't using it, right? So this is hardly a universal solution that will not affect existing users and repositories. Exactly. This is what I meant when I said that we cannot apply the submitted patch as it is, at the very beginning of this thread. The submitted patch simply copies the MacPorts solution and has the same compatibility problems. I think the discussion made clear that there are two ways to move forward: 1) Implement a client-side mapping table which maps server-provided paths to local filesystem paths. It translates between one or more server-side and local representations of the same path. This could be done only on Mac OS X (or, preferrably, only on HFS+ filesystems) because only Mac OS X has problems. The idea here is to not change existing paths in repositories at all, no matter which way they are encoded, and to teach Mac OS X clients to cope with the problem locally. This way, other existing clients won't notice a difference. The only thing that won't work is to create a working copy on Mac OS X which contains the same name multiple times, in NFD and in some other normalised or non-normalised form. This approach was suggested by Peter. The Unicode Standard says canonical equivalent sequences should be interpreted the same way. * 1.1 Canonical and Compatibility Equivalence http://unicode.org/reports/tr15/#Canonical_Equivalence * 2.12 Equivalent Sequences and Normalization http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf So we should not have the same name multiple times in repositories and working copies. Therefore subversion servers and clients does not need to handle them. Rather I think we should fix subversion to reject the same name in a different form. To handle existing repositories and working copies, maybe we should create a tool which checks repositories and working copies have the same name multiple times. If they have, users must rename files manually. In reality, I think this is extremely rare. We'd need either a working patch or a more detailed implementation design document to move forward here. OK. Peter, or somebody else, please give us either one of them. 2) Do something else that effects repositories, too, and provide a clean upgrade path for everyone (servers and clients). AFAIK nobody has made a suggestion as to what could be done here. What do you mean by a clean upgrade? Is it clean if we do dump and load for repositories and re-checkout for working copies? -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
2012/2/3 Julian Foad julianf...@btopenworld.com: You may well be correct that NFC is never longer than NFD, but that's not the question. The question is whether NFC may be longer than the current paths (which are not normalized to normalization form C or to form D). And the answer is yes it may be longer. See http://unicode.org/faq/normalization.html#11. Oh, I didn't know that. Thanks for letting me know. I also read all other items in http://unicode.org/faq/normalization.html#11 and all of http://www.unicode.org/reports/tr15/ and learned more about normalization. Maybe we should revise the note. http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames Here I quote from http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames The proposed internal 'normal form' should be NFC, if only if it were because it's the most compact form of the two: when allocating memory to store a conversion result, it won't be necessary (ever) to allocate more than the size of the input buffer. That statement seems to be talking about converting between NFC and NFD, not from un-normalized to normalized. Yes, indeed. So, we need to normalize input paths before processing. We choose NFC as normalization form. -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
2012/2/3 Peter Samuelson pe...@p12n.org: [Hiroaki Nakamura] In option (2), we do n12n on all clients on all platforms, and we include web_dav_svn in clients. So we convert all input paths to the server encoding, which is NFC. Indeed. But the very concept of a server encoding means we are involving the server side. Which invokes a lot of difficult questions like what about existing 1.x clients, what about existing checkouts and what about existing repositories. Svn 1.7 forces me to upgrade existing 1.6 working copies. So we can let users to upgrade working copies. Existing repositories, I think it would be better to convert them too using svndump/svnload. And we change svnload to convert filenames to NFC. However in reality we cannot force users to convert every existing repository. So we need to change servers too. When servers read filenames from repositories, they first convert to NFC and then process commands. We also need to changes servers in order to deal with existing 1.x clients. We convert filenames to NFC when web_dav_svn and svnserve receive filenames from clients, they must first convert filenames to NFC. By proposing a client-only solution, I hope to avoid _all_ those questions. (Except what about existing checkouts - there would be a wc upgrade of some sort.) No recoding of existing repository paths is necessary. In my proposal, the only recoding that is done is on the client side, on a platform that does not support the original pathname (e.g., OS X HFS+ with a NFC path). All problems in computer science can be solved by another level of indirection. Mostly true. I can't tell if you quoted that as a point of support for my proposal, or as a point against it. Yes, with the mapping table, you can mangle filenames. However I think it is too complex for novice users. Users must care about the original filenames and the mangled filenames all the time. Well, there is no need to use this same proposal to also work around other filesystem limitations like avoiding : on Windows. It is just something that becomes _possible_. Also you must adapt all clients to use the mapping table. That is whole lot of work! Maybe you would create another version control system. By all clients I guess you mean all Subversion client libraries. Yes, that is the proposal. It would touch libsvn_wc and probably libsvn_client and libsvn_subr. Yes, like I said above, clients actually includes components that run on servers like web_dav_svn, and it should read as any components that access to repositories and working copies. We also need to change svnserve. So we'd better say all servers and clients. So even if Windows NTFS can have the same abstract filenames in both NFC and NFD simultaneously, we should avoid that, and we should only allow NFC filenames. This could be done, if we wanted to go to the trouble. Or we could just say use a pre-commit hook, like we tell people who want to prevent REAMDE and Reamde in a single dir. It is not the same level of interoperability problem as the one this thread is about. If you think in analogy to ASCII uppercase and lowercase examples, you miss the point. Please reread the Unicode Standard Annex #15 UAX #15: Unicode Normalization Forms http://unicode.org/reports/tr15/ Canonical equivalence is a fundamental equivalency between characters or sequences of characters that represent the same abstract character, and when correctly displayed should always have the same visual appearance and behavior. Figure 1 illustrates this equivalence. So, filenames in NFC and NFD are the equivalent, the same. README and readme are different. NFC/NFD and uppercase/lowercase are two different stories. Should we allow the same filenames in one directory? Of course not! If we allow that we go into really trouble and confusion. And OS X HSF+ does not allow that. So to support interoperability to OS X, we should not allow it in subversion too. -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
2012/2/3 Branko Čibej br...@xbc.nu: On 02.02.2012 20:59, Hiroaki Nakamura wrote: So we need to change servers too. When servers read filenames from repositories, they first convert to NFC and then process commands. That won't work. You have to do the initial lookup in a normalization-agnostic way, and neither BDB nor FSFS makes that possible wihout scanning whole directories. OK, then do scan whole directories. If you do not want that, we force users to convert existing repositories. I think we must choose one of the two. Tough choices, but I cannot think of a better way at least right now. We also need to changes servers in order to deal with existing 1.x clients. We convert filenames to NFC when web_dav_svn and svnserve receive filenames from clients, they must first convert filenames to NFC. Actually, libsvn_repos; this has to work with ra_local as well. And it would have to maintain a table for converting results back to how the client knows them. This is the hard part to get right; imagine: $ svn up U čombe How will the server know if the client represents the č in the same encoding that the now-normalizing server sends? Will the client scan the directory and normalize the names to find the local file that needs updating? Yes, without upgrading working copies, we must do that. If there is a better way, I would like to know. Please give us better solution if you have an idea all. -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
2012/2/3 Daniel Shahaf danie...@elego.de: Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100: On 02.02.2012 20:22, Peter Samuelson wrote: [Hiroaki Nakamura] In option (2), we do n12n on all clients on all platforms, and we include web_dav_svn in clients. So we convert all input paths to the server encoding, which is NFC. Indeed. But the very concept of a server encoding means we are involving the server side. Which invokes a lot of difficult questions like what about existing 1.x clients, what about existing checkouts and what about existing repositories. By proposing a client-only solution, I hope to avoid _all_ those questions. Can't see how that works, unless you either make the client-side solution optional, create a mapping table, or make name lookup on the server agnostic to character representation. I can't envision how any of those solutions would work all the time. It would be nice if we could normalize paths in the repository without having to perform a dump/reload cycle, but I don't know how that would work in FSFS It won't. Changing the encoding increase the length (in bytes) of the string (in the dirents hash, for example), and thus change the offsets of the node-revs that are later in the file --- to which subsequent revisions, and the id's of those node-revs, refer. Changes from NFD to NFC does not increase the length. The length will be same or smaller, not larger. Here I quote from http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames The proposed internal 'normal form' should be NFC, if only if it were because it's the most compact form of the two: when allocating memory to store a conversion result, it won't be necessary (ever) to allocate more than the size of the input buffer. -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
2012/2/3 Peter Samuelson pe...@p12n.org: [Hiroaki Nakamura] Existing repositories, I think it would be better to convert them too using svndump/svnload. And we change svnload to convert filenames to NFC. However in reality we cannot force users to convert every existing repository. Also note that if you convert a repository (via dump/load or whatever), all working copies based on the repository are invalidated and need to be re-checked-out. Avoiding _that_ problem would be really hairy, I think, very similar to the sort of work that would be needed to support obliterate without losing working copies. We also need to changes servers in order to deal with existing 1.x clients. We convert filenames to NFC when web_dav_svn and svnserve receive filenames from clients, they must first convert filenames to NFC. You keep saying what we must do on the server side. I propose something that is purely on the client side. It will solve the OS X / non-OS X interoperability problem. It will not solve every problem ever faced by a Subversion user. That's a job for 2.0. OK. When I started this thread, I suppose we'd like to focus to long term solution 2.x. That's because the short term solution options (4) written in http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames seems too diificult and complex for me. But if a modification to my proposal will fit in short term 1.x, I will modify it delightedly. Yes, like I said above, clients actually includes components that run on servers like web_dav_svn, and it should read as any components that access to repositories and working copies. No. By clients I mean components that run on the client side. If my proposal had required changes to mod_dav_svn, I would not have said strictly client-side. I do not propose any change to mod_dav_svn, svnserve, svnadmin, libsvn_repos, libsvn_fs, the repository data, or anything else on the server side. If you think in analogy to ASCII uppercase and lowercase examples, you miss the point. Please reread the Unicode Standard Annex #15 UAX #15: Unicode Normalization Forms http://unicode.org/reports/tr15/ Thanks, I've read it. The analogy stands. We could prevent NFC/NFD collisions as an additional service to users, something we have not done for the past 10 years. This would be along the lines of preventing users from shooting themselves in the foot. The actual _software_ problem that is solved by preventing collisions is the same as the software problem solved by preventing upper/lower case collisions: certain clients are unable to check out a folder that has such collisions. (Windows clients, in the case of upper/lower collisions; OS X clients, in the case of NFC/NFD collisions.) Yes, I agree with that. I think we are talking past each other. You are trying to solve two distinct but related problems: 1. OS X client-side confusion when faced with a non-NFD repository path; 2. NFC/NFD collisions. I am only trying to solve problem 1. I'm ignoring problem 2 for two reasons: (a) Problem 2 requires server-side work and complex compatibility / upgrade scenarios (dump/load, re-check-out all wcs, etc). (b) Problem 2 can be worked around, for new repositories (or repositories with no existing collisions), with a pre-commit hook. ...neither of which are true for my proposal to solve problem 1. So long as you continue to insist that, to solve problem 1, we must also solve problem 2, I'm pretty sure we will never come to any agreement. OK. So how about changing my proposal like: (1) No sever modification. Just modify svn_path_cstring_to_utf8 only. (2) Let users install a pre-commit hook which rejects any non-NFC filenames. In this way, we only need one function. Modification is just like the original OS X unicode path patch: utf8precompose_macosx_2.patch http://subversion.tigris.org/nonav/issues/showattachment.cgi/813/utf8precompose_macosx_2.patch in http://subversion.tigris.org/issues/show_bug.cgi?id=2464 Only difference the original patch to my patch will be mine use utf8proc so that we can use it on all platforms, Mac OS X, Windows and Linux. -- )Hiroaki Nakamura) hnaka...@gmail.com
Let's discuss about unicode compositions for filenames!
Hi folks! I read the note about unicode compositions for filenames http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames and would like to drive the discussion. First, for me, the short term solution (4) seems too difficult to implement. It is very complex and error-prone, so here I focus to the long term solution (2). It is simple. We convert all input paths into the 'normal' normal form (NFC), using utf8proc. http://www.public-software-group.org/utf8proc I made a quick-and-dirty proof-of-concept patch for the further discussion. If you run apache + mod_dav_svn with this patch, NFD filenames in commits by svn client without this patch will be converted to NFC. This patch has following limitations right now but we can fix them. - It does not handle all input paths, only two: one for mod_dav_svn open_stream, one for svn_path_cstring_to_utf8. - The error handling is not yet implemented. - The configure script should be modified for linking against the utf8proc library. Currently it needs EXTRA_LDFLAGS=-lutf8proc when running make. To test this patch, please do the steps below. (1) build and install utf8proc The example below is for Scientific Linux 6.1 x86_64. Currently I install utf8proc to system library locations (/usr/include and /usr/lib64), not places like /usr/local/include and /usr/local/lib64, just because I don't want to bother about modifying the configure script right now. wget http://www.public-software-group.org/pub/projects/utf8proc/v1.1.5/utf8proc-v1.1.5.tar.gz tar xf utf8proc-v1.1.5.tar.gz cd utf8proc-v1.1.5 make c-library sudo install -m 644 libutf8proc.so /usr/lib64/libutf8proc.so.1.1.5 sudo ln -s libutf8proc.so.1.1.5 /usr/lib64/libutf8proc.so.1 sudo ln -s libutf8proc.so.1 /usr/lib64/libutf8proc.so sudo install -m 644 utf8proc.h /usr/include (2) build Subversion 1.7.2 with this patch. cd subversion-1.7.2 patch -p1 ../subversion-1.7.2-NFC.diff ./configure EXTRA_LDFLAGS=-lutf8proc make sudo make install One thing I'd like to discuss is how we link to utf8proc. There are two options. (1) Install utf8proc as a shared library and modify the configure script to have --with-utf8proc option. (2) Copy the utf8proc source files in the subversion source directories and use static link (like sqlite-amalgamation). The option (1) needs the utf8proc package to be created for each OS distribution and modify the dependency of the subversion package. I think this is the ideal way, but that is a lot of work. I think the option (2) is easier. Put utf8proc source files in the subversion source tarballs. Am I on the right track? Let's discuss and fix this problem and we will be happy ever after! -- )Hiroaki Nakamura) hnaka...@gmail.com subversion-1.7.2-NFC.diff diff -ruN subversion-1.7.2.orig/subversion/include/svn_utf.h subversion-1.7.2/subversion/include/svn_utf.h --- subversion-1.7.2.orig/subversion/include/svn_utf.h 2009-11-17 04:07:17.0 +0900 +++ subversion-1.7.2/subversion/include/svn_utf.h 2012-01-29 11:54:20.150665621 +0900 @@ -220,6 +220,14 @@ const svn_string_t *src, apr_pool_t *pool); +/** Set @a *dest to a NFC canonicalized C string from string @a src; + * allocate @a *dest in @a pool. + */ +svn_error_t * +svn_utf_cstring_NFC(const char **dest, +const char *src, +apr_pool_t *pool); + #ifdef __cplusplus } #endif /* __cplusplus */ diff -ruN subversion-1.7.2.orig/subversion/libsvn_subr/path.c subversion-1.7.2/subversion/libsvn_subr/path.c --- subversion-1.7.2.orig/subversion/libsvn_subr/path.c 2011-01-18 06:45:39.0 +0900 +++ subversion-1.7.2/subversion/libsvn_subr/path.c 2012-01-29 18:01:06.900398904 +0900 @@ -1119,15 +1119,17 @@ const char *path_apr, apr_pool_t *pool) { + char *path_nfc; + SVN_ERR(svn_utf_cstring_NFC(path_nfc, path_apr, pool)); svn_boolean_t path_is_utf8; SVN_ERR(get_path_encoding(path_is_utf8, pool)); if (path_is_utf8) { - *path_utf8 = apr_pstrdup(pool, path_apr); + *path_utf8 = apr_pstrdup(pool, path_nfc); return SVN_NO_ERROR; } else -return svn_utf_cstring_to_utf8(path_utf8, path_apr, pool); +return svn_utf_cstring_to_utf8(path_utf8, path_nfc, pool); } diff -ruN subversion-1.7.2.orig/subversion/libsvn_subr/utf.c subversion-1.7.2/subversion/libsvn_subr/utf.c --- subversion-1.7.2.orig/subversion/libsvn_subr/utf.c 2011-08-24 00:04:38.0 +0900 +++ subversion-1.7.2/subversion/libsvn_subr/utf.c 2012-01-29 17:55:33.643895922 +0900 @@ -42,6 +42,7 @@ #include private/svn_utf_private.h #include private/svn_dep_compat.h #include private/svn_string_private.h +#include utf8proc.h @@ -1029,3 +1030,58 @@ return err; } + +static ssize_t svn_utf_map( + const uint8_t *str, ssize_t len, uint8_t **dstptr, int options, + apr_pool_t *pool +) { + int32_t