Re: Let's discuss about unicode compositions for filenames!
2012/2/9 Markus Schaber m.scha...@3s-software.com: Hi, Von: Stefan Sperling [mailto:s...@elego.de] On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: [Upgrade options / backwards compatibility for proposed unicode normalization fix] - Need to re-checkout existing working copies of the repository? = Yes, but only if config is changed from the default. Maybe this could even be avoided if newer clients (or an utility script) can upgrade the working copy to the normalized format. Yes, if the working copy does not have filename collisions. However, for compatibility, we cannot let newer clients upgrade working copies automatically because existing working copies may have filename collisions. Best regards Markus Schaber -- ___ We software Automation. 3S-Smart Software Solutions GmbH Markus Schaber | Developer Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax +49-831-54031-50 Email: m.scha...@3s-software.com | Web: http://www.3s-software.com CoDeSys internet forum: http://forum.3s-software.com Download CoDeSys sample projects: http://www.3s-software.com/index.shtml?sample_projects Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915 -- 中村 弘輝 )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
Hi, 2012/2/9 Thomas Åkesson tho...@akesson.cc: Hi, I have been interested in this issue for a couple of years and I remember it was discussed briefly at Subconf in Germany a couple of years ago. Branching the thread here because I'd like to propose a different approach than Hiroaki. This proposition is not very different from the note unicode-composition-for-filenames or what Peter S, Neels and others suggested, perhaps just combining 2 changes slightly differently. This is based on my limited understanding of WC-NG, please correct me if I make incorrect assumptions. - Server will still accept both NFC and NFD, however, it will no longer accept collisions. Enforced by normalising to NFD before uniqueness checks during add operations (yes, might be more expensive). There will be no unified normalisation, but the subversion server will work like most filesystems; return what was given to it. For compatibility, we cannot ignore existing repositories and working copies which have filename collisions. So we cannot enforce subversion servers and clients to normalize filenames. We must let users to choose whether filenames are normalized or not per repository. -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: Let's discuss about unicode compositions for filenames!
On 11.02.2012 13:05, Hiroaki Nakamura wrote: 2012/2/9 Markus Schaber m.scha...@3s-software.com: Hi, Von: Stefan Sperling [mailto:s...@elego.de] On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: [Upgrade options / backwards compatibility for proposed unicode normalization fix] - Need to re-checkout existing working copies of the repository? = Yes, but only if config is changed from the default. Maybe this could even be avoided if newer clients (or an utility script) can upgrade the working copy to the normalized format. Yes, if the working copy does not have filename collisions. However, for compatibility, we cannot let newer clients upgrade working copies automatically because existing working copies may have filename collisions. That's not entirely true, since we can detect the collisions in advance, and a partially upgraded working copy would still work From a practical point of view, it's very, very unlikely that there would be any such collisions in a valid working copy. People would tend to notice. :) -- Brane
Re: Let's discuss about unicode compositions for filenames!
2012/2/11 Branko Čibej br...@apache.org: On 11.02.2012 13:05, Hiroaki Nakamura wrote: 2012/2/9 Markus Schaber m.scha...@3s-software.com: Von: Stefan Sperling [mailto:s...@elego.de] On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: [Upgrade options / backwards compatibility for proposed unicode normalization fix] - Need to re-checkout existing working copies of the repository? = Yes, but only if config is changed from the default. Maybe this could even be avoided if newer clients (or an utility script) can upgrade the working copy to the normalized format. Yes, if the working copy does not have filename collisions. However, for compatibility, we cannot let newer clients upgrade working copies automatically because existing working copies may have filename collisions. That's not entirely true, since we can detect the collisions in advance, and a partially upgraded working copy would still work From a practical point of view, it's very, very unlikely that there would be any such collisions in a valid working copy. People would tend to notice. :) Yes, I agree wholeheartedly! At work, I notice there are a few repositories which have NFC filenames and NFD filenames. However there is no repository which have collisions as far as I know. -- )Hiroaki Nakamura) hnaka...@gmail.com
Re: 1.7.3 up for signing / testing
Hyrum K Wright hyrum.wri...@wandisco.com writes: At long last, here are the 1.7.3 tarballs for testing and signing: http://people.apache.org/~hwright/svn/1.7.3/ I'm getting intermittent failures in check-swig-rb on Ubuntu 10.04. Some runs complete without error and some produce errors (not always the same tests): 1) Failure: test_revision_status(SvnWcTest) /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_wc.rb:656:in `test_revision_status' /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/util.rb:202:in `make_context' /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_wc.rb:645:in `test_revision_status': -1 expected but was 2. 1) Failure: test_delta_with_deprecated_api(SvnFsTest) /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_fs.rb: 373:in `test_delta' /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/util.rb:202 :in `make_context' /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_fs.rb: 359:in `test_delta' /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_fs.rb: 407:in `test_delta_with_deprecated_api': A\n\n\n\nE\n expected but was a\n\n\n\ne\n. -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com
Re: 1.7.3 up for signing / testing
Philip Martin philip.mar...@wandisco.com writes: Hyrum K Wright hyrum.wri...@wandisco.com writes: At long last, here are the 1.7.3 tarballs for testing and signing: http://people.apache.org/~hwright/svn/1.7.3/ I'm getting intermittent failures in check-swig-rb on Ubuntu 10.04. Some runs complete without error and some produce errors (not always the same tests): 1) Failure: test_revision_status(SvnWcTest) /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_wc.rb:656:in `test_revision_status' /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/util.rb:202:in `make_context' /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_wc.rb:645:in `test_revision_status': -1 expected but was 2. I'm getting similar FAILs with 1.7.2 so this is not a regression. I only see FAILs when running the tests on a ramdisk, I've run the tests several times on a hard drive and not seen a single FAIL. -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com
Re: 1.7.3 up for signing / testing
Summary: +1 to release Platform: Linux (Debian/squeeze and Ubuntu/10.04) Tested: (local, svn, svn+sasl, serf, neon) x (fsfs, fsfs/pack/shard, bdb) (serf/v1, neon/v1) x (fsfs, bdb) swig-pl, swig-py, swig-rb javahl x (fsfs, bdb) Results: All tests PASS Local dependencies Debian: apache2-threaded-dev : 2.2.16-6+squeeze4 libapr1-dev : 1.4.2-6+squeeze3 libaprutil1-dev : 1.3.9+dfsg-5 libdb4.8-dev : 4.8.30-2 libneon27-dev : 0.29.3-3 libsasl2-dev : 2.1.23.dfsg1-7 libsqlite3-dev : 3.7.3-1 perl : 5.10.1-17squeeze2 python2.6-dev : 2.6.6-8+b1 ruby1.8-dev : 1.8.7.302-2squeeze1 openjdk-6-jdk : 6b18-1.8.9-0.1~sqeeze1 serf : trunk@1564 Local dependencies Ubuntu: apache2-threaded-dev : 2.2.14-5ubuntu8.7 libapr1-dev : 1.3.8-1ubuntu0.3 libaprutil1-dev : 1.3.9+dfsg-3ubuntu0.10.04.1 libdb4.8-dev : 4.8.24-1ubuntu1 libneon27-gnutls-dev : 0.29.0-1 libsasl2-dev : 2.1.23.dfsg1-5ubuntu1 libsqlite3-dev : 3.6.22-1 perl : 5.10.1-8ubuntu2.1 python2.6-dev : 2.6.5-1ubuntu6 ruby1.8-dev : 1.8.7.249-2 openjdk-6-jdk : 6b20-1.9.10-0ubuntu1~10.04.3 serf : trunk@1564 subversion-1.7.3.tar.bz2 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iQEcBAABCAAGBQJPNqpYAAoJEHbXiOHtGlmcbPkH/AvtnbJ8+Nz/kCEVW0ojSzHg bLS1vevdMLbLsVHIBXMiE2UOHLUoiNJkaloSRw0YToldmyCOGMKtYWuOFruycgcq MOOfuSZqCBG+Z6sfDN9U7Bpkmfv6wIVGc+lo6ZxrM5N7hnTr7vo4YOAq1lcTQx5Q jAxOQxU9xb3i7yi46ynMdRhjByuealUujlnJyuZWNx6v8TQiv6gSPMEV3X7Fgcr4 MgXjC12zXn3SsQoSx0aQn7WpIjBheM9n0/EceZDpdQfLmS022btSzhLAz573q6T6 pP5eJQ2qVu+VjACL9tC8lqQw6M29RAcbUq2PX91X7iVYEreczP51u1t4E/fFPsk= =F7Pd -END PGP SIGNATURE- subversion-1.7.3.tar.gz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iQEcBAABCAAGBQJPNqp+AAoJEHbXiOHtGlmc/J4H/AxFVKCS3A+HTc9tvDOLEmWi 18awhyiZcisyUcw3M8+X4qVG1O+ik0QnpOKsb7Vnfz3eDrFMSL8mitl6WCZJGxiL zS431QOSAy+EmDuTQvhz8RQa+G1F6GpPHX3nCDYhwPxpNMBYq/rlNg7efnjrOM79 GBSo0wf9jTMV23oGITptoYtcCiYjUFY+cQFhz6nK2J/n8wkXg18vFjKLpJm6Vwrl /0KN7tOPbB8+PvbEc7tA5SVxeoy1lXwY85o6wUD49OWZWn7m6Da0w8e2r7NdtqoI DCO8+QEsUX3AJgSITFWE8kMntQFymDrC7RSXbwyHVL1GlpzTUvViY2YXRZ9k8xM= =0oqV -END PGP SIGNATURE- -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com
Re: [RFC] Inheritable Properties
On 10.02.2012 12:21, Branko Čibej wrote: On 07.02.2012 22:24, Stefan Fuhrmann wrote: On 07.02.2012 00:41, Greg Stein wrote: In most data storage mechanisms for the repository, inheritable properties are a performance killer. I'm not sure that this is actually applicable to SVN for two reasons: (1) we use deltification and I have absolutely no idea how deltification helps with inheritable properties. Obviously. There are two important points to make here. First, a system like Subversion *must* use some sort of fragmented / multi-step data access. Indirect access to properties is not something extraordinary here. Second, access to large databases is dominated by their physical organization. Details to both points below. (2) we often handle whole file trees Neither here nor there. On the contrary. This is an essential difference to e.g. NTFS. Subversion reads individual nodes only very rarely while most OSes can open single files only. Checking for props, reading and finally evaluating them must be as fast as possible. Inherited properties eliminate the need to read props on most nodes (only checking that there is no local override). Even the evaluation of e.g. inherited ACLs may be skipped if the semantics has been chosen appropriately. This is a perfect example how elimination of redundancy (e.g. by deltification) improves performance rather than incurring a penalty. Inheritable properties would be /relatively/ less of a killer in SVN backends because we're already doing lookups the silly way, i.e., a lookup for /a/b/c will resolve and read /a and a/b while searching for .../c, so it's not much extra work to keep the current values of inheritable properties in the lookup context. The silly part of FSFS is that it does not optimize access paths, yet, but stores changes individually. The challenge is our two-dimensional key space and the fact that different operations traverse the data along different dimensions (e.g. log ./. checkout). With my latest commit, the caching code allows for more or less O(1) access / O(n) traversal along these dimensions. A proper lookup would jump straight to /a/b/c without examining the intermediate directories, and /then/ it would have to climb back up the tree to find inheritable props (or ACLs, same difference in this case). For a real filesystem, that's definitely a performance killer, and the reason why NTFS fakes ACL inheritance. The assumption is that you'll be changing inheritable ACLs a lot less often than you will be reading them, so the storage/performance tradeoff is definitely worth it. Question: how many entries would a direct lookup structure need to have (i.e. path@rev - data pointer)? Keep in mind that may valid paths like /branches/foo/bar will never be mentioned anywhere in a SVN repository because they never got touched under that name. A rough estimation for a fully expanded list of entries is #nodesInTrunk * #openBranches * #revisions This yields 10^9 entries for small repositories and 10^14 for KDE-sized ones. Clearly impractical. Even NTFS does not attempt a direct mapping but uses a tree structure and simply hopes to cache enough nodes to make access performance acceptable. The differences to FSFS are details of the tree representation. I suspect the situation in SVN FS is quite similar, and if we restructured the way the directory tree is represented to something similar to how WC-NG (or Mercurial) does it, these issues would suddenly become more important. For the working copy, things are different because we are more likely to access to single items and we need to support data changes. The latter calls for more flexible / generic data structures than the r/o data backend where small size can be made to equal high performance. Sorage / performance tradeoffs on the *client side* are plausible, though. -- Stefan^2.
Re: svn commit: r1241718 - in /subversion/trunk/subversion/libsvn_fs_fs: caching.c fs.h fs_fs.c
On 09.02.2012 16:05, Daniel Shahaf wrote: stef...@apache.org wrote on Wed, Feb 08, 2012 at 00:44:26 -: Author: stefan2 Date: Wed Feb 8 00:44:26 2012 New Revision: 1241718 URL: http://svn.apache.org/viewvc?rev=1241718view=rev Log: Major improvement in delta window handling: Cache intermediate combined delta windows such that changes close by won't need to discover and read the whole chain again. For algorithms that traverse history linearly, this optimization gives delta combination an amortized constant runtime. For now, we only cache representations 100kB. Support for larger reps can be added later. * subversion/libsvn_fs_fs/fs.h (fs_fs_data_t): add cache for combined windows * subversion/libsvn_fs_fs/caching.c (svn_fs_fs__initialize_caches): initialize that cache * subversion/libsvn_fs_fs/fs_fs.c (rep_state): add reference to new cache (create_rep_state_body): init that reference (rep_read_baton): add reference to cached base window (get_cached_combined_window, set_cached_combined_window): new utility functions (build_rep_list): terminate delta chain early if cached base window is available (rep_read_get_baton): adapt caller (get_combined_window): re-implement (get_contents): handle new special case; adapt to get_combined_window() signature changes Modified: subversion/trunk/subversion/libsvn_fs_fs/caching.c subversion/trunk/subversion/libsvn_fs_fs/fs.h subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c I haven't reviewed this, but a question: +/* Read the WINDOW_P for the rep state RS from the current FSFS session's + * cache. This will be a no-op and IS_CACHED will be set to FALSE if no + * cache has been given. If a cache is available IS_CACHED will inform + * the caller about the success of the lookup. Allocations (of the window + * in particualar) will be made from POOL. + */ +static svn_error_t * +get_cached_combined_window(svn_stringbuf_t **window_p, + struct rep_state *rs, + svn_boolean_t *is_cached, + apr_pool_t *pool) +{ + if (! rs-combined_cache) +{ + /* txdelta window has not been enabled */ + *is_cached = FALSE; +} + else +{ + /* ask the cache for the desired txdelta window */ + return svn_cache__get((void **)window_p, +is_cached, +rs-combined_cache, +get_window_key(rs, rs-start, pool), How does the cache key identify the particular combined window being cached? Undeltified windows use the same key as their deltified representation; basically revision file + offset. The distinction between deltified and un-deltified is made by the cache instance prefix. get_window_key() may return . If it returns when called as an argument to svn_cache__set() and then also here, won't the cache return a wrong result? There is a comment in get_window_key() for this case. will only be returned if we can't get the name of the open APR file. This is virtually impossible. If it happens anyways, it will hit the deltified window cache first, we will report a repository corruption. But maybe we should change the cache API definition to support and reject NULL keys. get_window_key() could then return simply NULL and could do with fewer assumptions. -- Stefan^2.
Re: svn commit: r1241718 - in /subversion/trunk/subversion/libsvn_fs_fs: caching.c fs.h fs_fs.c
Stefan Fuhrmann wrote on Sun, Feb 12, 2012 at 03:06:31 +0100: On 09.02.2012 16:05, Daniel Shahaf wrote: stef...@apache.org wrote on Wed, Feb 08, 2012 at 00:44:26 -: Author: stefan2 Date: Wed Feb 8 00:44:26 2012 New Revision: 1241718 URL: http://svn.apache.org/viewvc?rev=1241718view=rev Log: Major improvement in delta window handling: Cache intermediate combined delta windows such that changes close by won't need to discover and read the whole chain again. For algorithms that traverse history linearly, this optimization gives delta combination an amortized constant runtime. For now, we only cache representations 100kB. Support for larger reps can be added later. * subversion/libsvn_fs_fs/fs.h (fs_fs_data_t): add cache for combined windows * subversion/libsvn_fs_fs/caching.c (svn_fs_fs__initialize_caches): initialize that cache * subversion/libsvn_fs_fs/fs_fs.c (rep_state): add reference to new cache (create_rep_state_body): init that reference (rep_read_baton): add reference to cached base window (get_cached_combined_window, set_cached_combined_window): new utility functions (build_rep_list): terminate delta chain early if cached base window is available (rep_read_get_baton): adapt caller (get_combined_window): re-implement (get_contents): handle new special case; adapt to get_combined_window() signature changes Modified: subversion/trunk/subversion/libsvn_fs_fs/caching.c subversion/trunk/subversion/libsvn_fs_fs/fs.h subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c I haven't reviewed this, but a question: +/* Read the WINDOW_P for the rep state RS from the current FSFS session's + * cache. This will be a no-op and IS_CACHED will be set to FALSE if no + * cache has been given. If a cache is available IS_CACHED will inform + * the caller about the success of the lookup. Allocations (of the window + * in particualar) will be made from POOL. + */ +static svn_error_t * +get_cached_combined_window(svn_stringbuf_t **window_p, + struct rep_state *rs, + svn_boolean_t *is_cached, + apr_pool_t *pool) +{ + if (! rs-combined_cache) +{ + /* txdelta window has not been enabled */ + *is_cached = FALSE; +} + else +{ + /* ask the cache for the desired txdelta window */ + return svn_cache__get((void **)window_p, +is_cached, +rs-combined_cache, +get_window_key(rs, rs-start, pool), How does the cache key identify the particular combined window being cached? Undeltified windows use the same key as their deltified representation; basically revision file + offset. The distinction between deltified and un-deltified is made by the cache instance prefix. What revision file and what offset, and how do they related to the window object contained in the cache? (I'm going to guess that the key is the rev-file/offset of a rep that generates the same fulltext as the cached window; but I shouldn't have to guess.) get_window_key() may return . If it returns when called as an argument to svn_cache__set() and then also here, won't the cache return a wrong result? There is a comment in get_window_key() for this case. will only be returned if we can't get the name of the open APR file. This is virtually impossible. If it happens anyways, it will hit the deltified window cache first, we will report a repository corruption. In plain English: there is an unlikely, but not impossible, scenario where the only thing between your new code and silent corruption (specifically: incorrect retrieval of a fulltext) is the order of lookups in two different caches. That sounds awfully brittle to me, and the sensitivity of the lookup order is not documented anywhere. But maybe we should change the cache API definition to support and reject NULL keys. get_window_key() could then return simply NULL and could do with fewer assumptions. What does support and reject mean? That trying to get(key=NULL) always returns not found and trying to set(key=NULL) doesn't change the cache's state? -- Stefan^2.
Re: [RFC] Inheritable Properties
On 12.02.2012 02:52, Stefan Fuhrmann wrote: The silly part of FSFS is that it does not optimize access paths, yet, but stores changes individually. The challenge is our two-dimensional key space and the fact that different operations traverse the data along different dimensions (e.g. log ./. checkout). Interestingly enough, the 2D keyspace isn't that big a problem. The real issue is that we don't even represent all the relevant keys, because of the lazy copying of subtrees. That's what actually prevents us from doing one-shot lookups of arbitrary path@rev; and even then, we'd only really have to do a step-by-step top-down resolve if the initial fast lookup failed. Question: how many entries would a direct lookup structure need to have (i.e. path@rev - data pointer)? Keep in mind that may valid paths like /branches/foo/bar will never be mentioned anywhere in a SVN repository because they never got touched under that name. A rough estimation for a fully expanded list of entries is #nodesInTrunk * #openBranches * #revisions This yields 10^9 entries for small repositories and 10^14 for KDE-sized ones. Clearly impractical. So that's not how you do it. :) You'd only need one reference per actual node, not per possible node lookup paths including revisions. Obviously you have to resolve path@rev to a concrete node before you can do anything with its attributes. With the caveat that there's a nasty edge case involving not-yet-lazy-copied child nodes; but given the way we currently crawl the tree, that's not really an issue. -- Brane