Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Hiroaki Nakamura
2012/2/9 Markus Schaber m.scha...@3s-software.com:
 Hi,

 Von: Stefan Sperling [mailto:s...@elego.de]

 On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote:
  [Upgrade options / backwards compatibility for proposed unicode 
  normalization fix]

 - Need to re-checkout existing working copies of the repository?
   = Yes, but only if config is changed from the default.

 Maybe this could even be avoided if newer clients (or an utility script) can 
 upgrade the working copy to the normalized format.

Yes, if the working copy does not have filename collisions. However,
for compatibility,
we cannot let newer clients upgrade working copies automatically
because existing
working copies may have filename collisions.


 Best regards

 Markus Schaber
 --
 ___
 We software Automation.

 3S-Smart Software Solutions GmbH
 Markus Schaber | Developer
 Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax 
 +49-831-54031-50

 Email: m.scha...@3s-software.com | Web: http://www.3s-software.com
 CoDeSys internet forum: http://forum.3s-software.com
 Download CoDeSys sample projects: 
 http://www.3s-software.com/index.shtml?sample_projects

 Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
 register: Kempten HRB 6186 | Tax ID No.: DE 167014915



-- 
中村 弘輝 )Hiroaki Nakamura) hnaka...@gmail.com


Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Hiroaki Nakamura
Hi,

2012/2/9 Thomas Åkesson tho...@akesson.cc:
 Hi,
 I have been interested in this issue for a couple of years and I remember it 
 was discussed briefly at Subconf in Germany a couple of years ago.

 Branching the thread here because I'd like to propose a different approach 
 than Hiroaki. This proposition is not very different from the note 
 unicode-composition-for-filenames or what Peter S, Neels and others 
 suggested, perhaps just combining 2 changes slightly differently.

 This is based on my limited understanding of WC-NG, please correct me if I 
 make incorrect assumptions.

 - Server will still accept both NFC and NFD, however, it will no longer 
 accept collisions. Enforced by normalising to NFD before uniqueness checks 
 during add operations (yes, might be more expensive). There will be no 
 unified normalisation, but the subversion server will work like most 
 filesystems; return what was given to it.

For compatibility, we cannot ignore existing repositories and working
copies which have filename
collisions. So we cannot enforce subversion servers and clients to
normalize filenames.
We must let users to choose whether filenames are normalized or not
per repository.

-- 
)Hiroaki Nakamura) hnaka...@gmail.com


Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Branko Čibej
On 11.02.2012 13:05, Hiroaki Nakamura wrote:
 2012/2/9 Markus Schaber m.scha...@3s-software.com:
 Hi,

 Von: Stefan Sperling [mailto:s...@elego.de]

 On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote:
 [Upgrade options / backwards compatibility for proposed unicode 
 normalization fix]
 - Need to re-checkout existing working copies of the repository?
   = Yes, but only if config is changed from the default.
 Maybe this could even be avoided if newer clients (or an utility script) can 
 upgrade the working copy to the normalized format.
 Yes, if the working copy does not have filename collisions. However,
 for compatibility,
 we cannot let newer clients upgrade working copies automatically
 because existing
 working copies may have filename collisions.

That's not entirely true, since we can detect the collisions in advance,
and a partially upgraded working copy would still work

From a practical point of view, it's very, very unlikely that there
would be any such collisions in a valid working copy. People would tend
to notice. :)

-- Brane



Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Hiroaki Nakamura
2012/2/11 Branko Čibej br...@apache.org:
 On 11.02.2012 13:05, Hiroaki Nakamura wrote:
 2012/2/9 Markus Schaber m.scha...@3s-software.com:
 Von: Stefan Sperling [mailto:s...@elego.de]
 On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote:
 [Upgrade options / backwards compatibility for proposed unicode 
 normalization fix]
 - Need to re-checkout existing working copies of the repository?
   = Yes, but only if config is changed from the default.
 Maybe this could even be avoided if newer clients (or an utility script) 
 can upgrade the working copy to the normalized format.
 Yes, if the working copy does not have filename collisions. However,
 for compatibility,
 we cannot let newer clients upgrade working copies automatically
 because existing
 working copies may have filename collisions.

 That's not entirely true, since we can detect the collisions in advance,
 and a partially upgraded working copy would still work

 From a practical point of view, it's very, very unlikely that there
 would be any such collisions in a valid working copy. People would tend
 to notice. :)

Yes, I agree wholeheartedly!
At work, I notice there are a few repositories which have NFC filenames
and NFD filenames. However there is no repository which have collisions
as far as I know.

-- 
)Hiroaki Nakamura) hnaka...@gmail.com


Re: 1.7.3 up for signing / testing

2012-02-11 Thread Philip Martin
Hyrum K Wright hyrum.wri...@wandisco.com writes:

 At long last, here are the 1.7.3 tarballs for testing and signing:
 http://people.apache.org/~hwright/svn/1.7.3/

I'm getting intermittent failures in check-swig-rb on Ubuntu 10.04.
Some runs complete without error and some produce errors (not always the
same tests):

  1) Failure:
test_revision_status(SvnWcTest)
/home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_wc.rb:656:in
 `test_revision_status'
/home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/util.rb:202:in
 `make_context'
/home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_wc.rb:645:in
 `test_revision_status':
-1 expected but was
2.

  1) Failure:
test_delta_with_deprecated_api(SvnFsTest)
/home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_fs.rb:
373:in `test_delta'
/home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/util.rb:202
:in `make_context'
/home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_fs.rb:
359:in `test_delta'
/home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_fs.rb:
407:in `test_delta_with_deprecated_api':
A\n\n\n\nE\n expected but was
a\n\n\n\ne\n.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com


Re: 1.7.3 up for signing / testing

2012-02-11 Thread Philip Martin
Philip Martin philip.mar...@wandisco.com writes:

 Hyrum K Wright hyrum.wri...@wandisco.com writes:

 At long last, here are the 1.7.3 tarballs for testing and signing:
 http://people.apache.org/~hwright/svn/1.7.3/

 I'm getting intermittent failures in check-swig-rb on Ubuntu 10.04.
 Some runs complete without error and some produce errors (not always the
 same tests):

   1) Failure:
 test_revision_status(SvnWcTest)
 /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_wc.rb:656:in
  `test_revision_status'
 /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/util.rb:202:in
  `make_context'
 /home/pm/sw/subversion/sign/build/subversion/bindings/swig/ruby/test/test_wc.rb:645:in
  `test_revision_status':
 -1 expected but was
 2.

I'm getting similar FAILs with 1.7.2 so this is not a regression.  I
only see FAILs when running the tests on a ramdisk, I've run the tests
several times on a hard drive and not seen a single FAIL.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com


Re: 1.7.3 up for signing / testing

2012-02-11 Thread Philip Martin
Summary:

  +1 to release

Platform:

  Linux (Debian/squeeze and Ubuntu/10.04)

Tested:

  (local, svn, svn+sasl, serf, neon) x (fsfs, fsfs/pack/shard, bdb)
  (serf/v1, neon/v1) x (fsfs, bdb)
  swig-pl, swig-py, swig-rb
  javahl x (fsfs, bdb)

Results:

  All tests PASS

Local dependencies Debian:

  apache2-threaded-dev : 2.2.16-6+squeeze4
  libapr1-dev : 1.4.2-6+squeeze3
  libaprutil1-dev : 1.3.9+dfsg-5
  libdb4.8-dev : 4.8.30-2
  libneon27-dev : 0.29.3-3
  libsasl2-dev : 2.1.23.dfsg1-7
  libsqlite3-dev : 3.7.3-1
  perl : 5.10.1-17squeeze2
  python2.6-dev : 2.6.6-8+b1
  ruby1.8-dev : 1.8.7.302-2squeeze1
  openjdk-6-jdk : 6b18-1.8.9-0.1~sqeeze1
  serf : trunk@1564

Local dependencies Ubuntu:

  apache2-threaded-dev : 2.2.14-5ubuntu8.7
  libapr1-dev : 1.3.8-1ubuntu0.3
  libaprutil1-dev : 1.3.9+dfsg-3ubuntu0.10.04.1
  libdb4.8-dev : 4.8.24-1ubuntu1
  libneon27-gnutls-dev : 0.29.0-1
  libsasl2-dev : 2.1.23.dfsg1-5ubuntu1
  libsqlite3-dev : 3.6.22-1
  perl : 5.10.1-8ubuntu2.1
  python2.6-dev : 2.6.5-1ubuntu6
  ruby1.8-dev : 1.8.7.249-2
  openjdk-6-jdk : 6b20-1.9.10-0ubuntu1~10.04.3
  serf : trunk@1564

subversion-1.7.3.tar.bz2

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iQEcBAABCAAGBQJPNqpYAAoJEHbXiOHtGlmcbPkH/AvtnbJ8+Nz/kCEVW0ojSzHg
bLS1vevdMLbLsVHIBXMiE2UOHLUoiNJkaloSRw0YToldmyCOGMKtYWuOFruycgcq
MOOfuSZqCBG+Z6sfDN9U7Bpkmfv6wIVGc+lo6ZxrM5N7hnTr7vo4YOAq1lcTQx5Q
jAxOQxU9xb3i7yi46ynMdRhjByuealUujlnJyuZWNx6v8TQiv6gSPMEV3X7Fgcr4
MgXjC12zXn3SsQoSx0aQn7WpIjBheM9n0/EceZDpdQfLmS022btSzhLAz573q6T6
pP5eJQ2qVu+VjACL9tC8lqQw6M29RAcbUq2PX91X7iVYEreczP51u1t4E/fFPsk=
=F7Pd
-END PGP SIGNATURE-

subversion-1.7.3.tar.gz

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iQEcBAABCAAGBQJPNqp+AAoJEHbXiOHtGlmc/J4H/AxFVKCS3A+HTc9tvDOLEmWi
18awhyiZcisyUcw3M8+X4qVG1O+ik0QnpOKsb7Vnfz3eDrFMSL8mitl6WCZJGxiL
zS431QOSAy+EmDuTQvhz8RQa+G1F6GpPHX3nCDYhwPxpNMBYq/rlNg7efnjrOM79
GBSo0wf9jTMV23oGITptoYtcCiYjUFY+cQFhz6nK2J/n8wkXg18vFjKLpJm6Vwrl
/0KN7tOPbB8+PvbEc7tA5SVxeoy1lXwY85o6wUD49OWZWn7m6Da0w8e2r7NdtqoI
DCO8+QEsUX3AJgSITFWE8kMntQFymDrC7RSXbwyHVL1GlpzTUvViY2YXRZ9k8xM=
=0oqV
-END PGP SIGNATURE-

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com


Re: [RFC] Inheritable Properties

2012-02-11 Thread Stefan Fuhrmann

On 10.02.2012 12:21, Branko Čibej wrote:

On 07.02.2012 22:24, Stefan Fuhrmann wrote:

On 07.02.2012 00:41, Greg Stein wrote:

In most data storage mechanisms for the repository, inheritable
properties are a performance killer.

I'm not sure that this is actually applicable to SVN
for two reasons:
(1) we use deltification and

I have absolutely no idea how deltification helps with inheritable
properties.


Obviously. There are two important points to make here.
First, a system like Subversion *must* use some sort of
fragmented / multi-step data access. Indirect access to
properties is not something extraordinary here. Second,
access to large databases is dominated by their physical
organization. Details to both points below.

(2) we often handle whole file trees

Neither here nor there.


On the contrary. This is an essential difference to e.g. NTFS.
Subversion reads individual nodes only very rarely while
most OSes can open single files only. Checking for props,
reading and finally evaluating them must be as fast as possible.

Inherited properties eliminate the need to read props on
most nodes (only checking that there is no local override).
Even the evaluation of e.g. inherited ACLs may be skipped
if the semantics has been chosen appropriately. This is a
perfect example how elimination of redundancy (e.g. by
deltification) improves performance rather than incurring
a penalty.

Inheritable properties would be /relatively/ less of a killer in SVN
backends because we're already doing lookups the silly way, i.e., a
lookup for /a/b/c will resolve and read /a and a/b while searching for
.../c, so it's not much extra work to keep the current values of
inheritable properties in the lookup context.


The silly part of FSFS is that it does not optimize access
paths, yet, but stores changes individually. The challenge
is our two-dimensional key space and the fact that different
operations traverse the data along different dimensions
(e.g. log ./. checkout).

With my latest commit, the caching code allows for more
or less O(1) access / O(n) traversal along these dimensions.

A proper lookup would jump straight to /a/b/c without examining the
intermediate directories, and /then/ it would have to climb back up the
tree to find inheritable props (or ACLs, same difference in this case).
For a real filesystem, that's definitely a performance killer, and the
reason why NTFS fakes ACL inheritance. The assumption is that you'll be
changing inheritable ACLs a lot less often than you will be reading
them, so the storage/performance tradeoff is definitely worth it.


Question: how many entries would a direct lookup structure
need to have (i.e. path@rev - data pointer)? Keep in
mind that may valid paths like /branches/foo/bar will never
be mentioned anywhere in a SVN repository because they
never got touched under that name. A rough estimation for
a fully expanded list of entries is

#nodesInTrunk * #openBranches * #revisions

This yields 10^9 entries for small repositories and 10^14
for KDE-sized ones. Clearly impractical.

Even NTFS does not attempt a direct mapping but uses
a tree structure and simply hopes to cache enough nodes
to make access performance acceptable. The differences
to FSFS are details of the tree representation.

I suspect the situation in SVN FS is quite similar, and if we
restructured the way the directory tree is represented to something
similar to how WC-NG (or Mercurial) does it, these issues would suddenly
become more important.


For the working copy, things are different because we
are more likely to access to single items and we need
to support data changes. The latter calls for more flexible /
generic data structures than the r/o data backend where
small size can be made to equal high performance.

Sorage / performance tradeoffs on the *client side* are
plausible, though.

-- Stefan^2.


Re: svn commit: r1241718 - in /subversion/trunk/subversion/libsvn_fs_fs: caching.c fs.h fs_fs.c

2012-02-11 Thread Stefan Fuhrmann

On 09.02.2012 16:05, Daniel Shahaf wrote:

stef...@apache.org wrote on Wed, Feb 08, 2012 at 00:44:26 -:

Author: stefan2
Date: Wed Feb  8 00:44:26 2012
New Revision: 1241718

URL: http://svn.apache.org/viewvc?rev=1241718view=rev
Log:
Major improvement in delta window handling: Cache intermediate
combined delta windows such that changes close by won't need
to discover and read the whole chain again.

For algorithms that traverse history linearly, this optimization
gives delta combination an amortized constant runtime.

For now, we only cache representations  100kB. Support for larger
reps can be added later.

* subversion/libsvn_fs_fs/fs.h
   (fs_fs_data_t): add cache for combined windows
* subversion/libsvn_fs_fs/caching.c
   (svn_fs_fs__initialize_caches): initialize that cache

* subversion/libsvn_fs_fs/fs_fs.c
   (rep_state): add reference to new cache
   (create_rep_state_body): init that reference
   (rep_read_baton): add reference to cached base window
   (get_cached_combined_window, set_cached_combined_window):
new utility functions
   (build_rep_list): terminate delta chain early if cached
base window is available
   (rep_read_get_baton): adapt caller
   (get_combined_window): re-implement
   (get_contents): handle new special case; adapt to
get_combined_window() signature changes

Modified:
 subversion/trunk/subversion/libsvn_fs_fs/caching.c
 subversion/trunk/subversion/libsvn_fs_fs/fs.h
 subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c


I haven't reviewed this, but a question:


+/* Read the WINDOW_P for the rep state RS from the current FSFS session's
+ * cache. This will be a no-op and IS_CACHED will be set to FALSE if no
+ * cache has been given. If a cache is available IS_CACHED will inform
+ * the caller about the success of the lookup. Allocations (of the window
+ * in particualar) will be made from POOL.
+ */
+static svn_error_t *
+get_cached_combined_window(svn_stringbuf_t **window_p,
+   struct rep_state *rs,
+   svn_boolean_t *is_cached,
+   apr_pool_t *pool)
+{
+  if (! rs-combined_cache)
+{
+  /* txdelta window has not been enabled */
+  *is_cached = FALSE;
+}
+  else
+{
+  /* ask the cache for the desired txdelta window */
+  return svn_cache__get((void **)window_p,
+is_cached,
+rs-combined_cache,
+get_window_key(rs, rs-start, pool),

How does the cache key identify the particular combined window being
cached?


Undeltified windows use the same key as their deltified
representation; basically revision file + offset. The distinction
between deltified and un-deltified is made by the cache
instance prefix.

get_window_key() may return .  If it returns  when called as an
argument to svn_cache__set() and then also here, won't the cache return
a wrong result?


There is a comment in get_window_key() for this case.
 will only be returned if we can't get the name of the
open APR file. This is virtually impossible. If it happens
anyways, it will hit the deltified window cache first, we will
report a repository corruption.

But maybe we should change the cache API definition
to support and reject NULL keys. get_window_key() could
then return simply NULL and could do with fewer assumptions.

-- Stefan^2.



Re: svn commit: r1241718 - in /subversion/trunk/subversion/libsvn_fs_fs: caching.c fs.h fs_fs.c

2012-02-11 Thread Daniel Shahaf
Stefan Fuhrmann wrote on Sun, Feb 12, 2012 at 03:06:31 +0100:
 On 09.02.2012 16:05, Daniel Shahaf wrote:
 stef...@apache.org wrote on Wed, Feb 08, 2012 at 00:44:26 -:
 Author: stefan2
 Date: Wed Feb  8 00:44:26 2012
 New Revision: 1241718
 
 URL: http://svn.apache.org/viewvc?rev=1241718view=rev
 Log:
 Major improvement in delta window handling: Cache intermediate
 combined delta windows such that changes close by won't need
 to discover and read the whole chain again.
 
 For algorithms that traverse history linearly, this optimization
 gives delta combination an amortized constant runtime.
 
 For now, we only cache representations  100kB. Support for larger
 reps can be added later.
 
 * subversion/libsvn_fs_fs/fs.h
(fs_fs_data_t): add cache for combined windows
 * subversion/libsvn_fs_fs/caching.c
(svn_fs_fs__initialize_caches): initialize that cache
 
 * subversion/libsvn_fs_fs/fs_fs.c
(rep_state): add reference to new cache
(create_rep_state_body): init that reference
(rep_read_baton): add reference to cached base window
(get_cached_combined_window, set_cached_combined_window):
 new utility functions
(build_rep_list): terminate delta chain early if cached
 base window is available
(rep_read_get_baton): adapt caller
(get_combined_window): re-implement
(get_contents): handle new special case; adapt to
 get_combined_window() signature changes
 
 Modified:
  subversion/trunk/subversion/libsvn_fs_fs/caching.c
  subversion/trunk/subversion/libsvn_fs_fs/fs.h
  subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
 
 I haven't reviewed this, but a question:
 
 +/* Read the WINDOW_P for the rep state RS from the current FSFS session's
 + * cache. This will be a no-op and IS_CACHED will be set to FALSE if no
 + * cache has been given. If a cache is available IS_CACHED will inform
 + * the caller about the success of the lookup. Allocations (of the window
 + * in particualar) will be made from POOL.
 + */
 +static svn_error_t *
 +get_cached_combined_window(svn_stringbuf_t **window_p,
 +   struct rep_state *rs,
 +   svn_boolean_t *is_cached,
 +   apr_pool_t *pool)
 +{
 +  if (! rs-combined_cache)
 +{
 +  /* txdelta window has not been enabled */
 +  *is_cached = FALSE;
 +}
 +  else
 +{
 +  /* ask the cache for the desired txdelta window */
 +  return svn_cache__get((void **)window_p,
 +is_cached,
 +rs-combined_cache,
 +get_window_key(rs, rs-start, pool),
 How does the cache key identify the particular combined window being
 cached?
 
 Undeltified windows use the same key as their deltified
 representation; basically revision file + offset. The distinction
 between deltified and un-deltified is made by the cache
 instance prefix.

What revision file and what offset, and how do they related to the
window object contained in the cache?

(I'm going to guess that the key is the rev-file/offset of a rep that
generates the same fulltext as the cached window; but I shouldn't have
to guess.)

 get_window_key() may return .  If it returns  when called as an
 argument to svn_cache__set() and then also here, won't the cache return
 a wrong result?
 
 There is a comment in get_window_key() for this case.
  will only be returned if we can't get the name of the
 open APR file. This is virtually impossible. If it happens
 anyways, it will hit the deltified window cache first, we will
 report a repository corruption.
 

In plain English: there is an unlikely, but not impossible, scenario
where the only thing between your new code and silent corruption
(specifically: incorrect retrieval of a fulltext) is the order of
lookups in two different caches.

That sounds awfully brittle to me, and the sensitivity of the lookup
order is not documented anywhere.

 But maybe we should change the cache API definition
 to support and reject NULL keys. get_window_key() could
 then return simply NULL and could do with fewer assumptions.
 

What does support and reject mean?  That trying to get(key=NULL)
always returns not found and trying to set(key=NULL) doesn't change
the cache's state?

 -- Stefan^2.
 


Re: [RFC] Inheritable Properties

2012-02-11 Thread Branko Čibej
On 12.02.2012 02:52, Stefan Fuhrmann wrote:

 The silly part of FSFS is that it does not optimize access
 paths, yet, but stores changes individually. The challenge
 is our two-dimensional key space and the fact that different
 operations traverse the data along different dimensions
 (e.g. log ./. checkout). 

Interestingly enough, the 2D keyspace isn't that big a problem. The real
issue is that we don't even represent all the relevant keys, because of
the lazy copying of subtrees. That's what actually prevents us from
doing one-shot lookups of arbitrary path@rev; and even then, we'd only
really have to do a step-by-step top-down resolve if the initial fast
lookup failed.

 Question: how many entries would a direct lookup structure
 need to have (i.e. path@rev - data pointer)? Keep in
 mind that may valid paths like /branches/foo/bar will never
 be mentioned anywhere in a SVN repository because they
 never got touched under that name. A rough estimation for
 a fully expanded list of entries is

 #nodesInTrunk * #openBranches * #revisions

 This yields 10^9 entries for small repositories and 10^14
 for KDE-sized ones. Clearly impractical.

So that's not how you do it. :)

You'd only need one reference per actual node, not per possible node
lookup paths including revisions. Obviously you have to resolve path@rev
to a concrete node before you can do anything with its attributes. With
the caveat that there's a nasty edge case involving not-yet-lazy-copied
child nodes; but given the way we currently crawl the tree, that's not
really an issue.

-- Brane