Any idea why public function like svn_fspath__dirname have double __ in its name?
Hi All, I could see the following functions in our public header with '__' in their names, svn_fspath__is_canonical svn_fspath__dirname svn_fspath__basename svn_fspath__split svn_fspath__join svn_fspath__is_child svn_fspath__skip_ancestor svn_fspath__is_ancestor svn_fspath__get_longest_ancestor svn_error__locate svn_error__malfunction IIUC __ convention is used for semi-public functions where we may not keep BC promise. Do I miss something? With regards Kamesh Jayachandran
Merge tracking between several parallel branches
This question has already been posted on the users list, but I did not receive any answer there. I have some questions regarding the status of the merge tracking functionality in subversion 1.6. Based on my understanding, it seems like svn has good “automatic” support for merging between branches in one direction, whereas cyclic/bidirectional merging requires a more complex “reintegrate” + “record only” merge procedure. I am wondering if this support also extends to scenarios with repeated merging between several parallel release branches. Does svn handle arbitrary directed acyclic merging between several parallel branches, when merging is always conducted in the same direction (e.g. from old to newer release branch)? Does the “reintegrate” + “record only” merge procedure extend gracefully to scenarios with arbitrary cyclic/bidirectional merging between several parallel branches? Thanks in advance, Fredrik Orderud
Re: Default commandline args
On 20.12.2010 23:28, Hyrum K. Wright wrote: In issue #3765, I suggest the possibility of a CVS-style config file wherein a user could specify a set of default arguments for a given subcommand. For example, 'svn diff' would default to 'svn diff -x -p' if the config file had an entry for such. Thinking this would be an SMOP, I jumped in...only to discover that it isn't. The actual implementation isn't hard, I'm just wondering about where the configuration should live. We have a ~/.subversion/config file, but these options aren't for the client library, they are specific to the commandline client. This makes me think that we should have a separate file for the commandline client, and that it may want to be something like ~/.svnrc or something equally commandline-ish. Thoughts? -Hyrum http://subversion.tigris.org/issues/show_bug.cgi?id=3765 I think this kind of feature can be useful and since it is a UI optimization specific / local to a particular client, I don't see why it should not be implemented. TSVN, for instance, uses its own implementation to manage user preferences and these are often quite specific to TSVN's UI needs. So, there is little room for or potential benefit of reuse across various client implementations. Not to limit the kind of information a particular client could store in its config file (e.g. multiple sections per client instead of one), I would prefer separate config files for each client. The number of clients would be 1 and I would not like to see my home folder being cluttered by tons of .svnrc, .svnadminrc ... files. Keep them in ~/.subversion/ . -- Stefan^2.
How is mixed authentication/anonymous access implemented
Hi, SVN features a mixed authentication/anonymous access (see http://svnbook.red-bean.com/nightly/en/svn.serverconfig.httpd.html#svn.serverconfig.httpd.authz.perdir.ex-3). I want to achieve the same functionality using a PHP script: allow anonymous access until accessing some special content and than request authentification which should be checked according to a htaccess-file. As far as i understand the SVN example the authentification is performed by the Apache modules. I configured the .htaccess file to look similar: Order allow,deny Allow from all AuthType Basic AuthName Realm AuthUserFile /path/to/.htusers require valid-user Satisfy any Additionally a PHP script is inside the same folder. When you now browse to the URL of the PHP script, you can access it without any credentials requested. At some point the PHP script decides that authentification is required (e.g. when passing a param like ?need-auth=1). I suppose this is similar to how the mixed authentication/anonymous access in SVN works (?). Therefore it sends the following two headers: WWW-Authenticate: Basic realm=Realm HTTP/1.x 401 Unauthorized Then the user is asked to insert username/password for the basic auth. But now comes the problem: The apache will ALWAYS let the user pass as anonymous access is always granted. I suppose the webserver does not even try to authenticate the user credentials. Therefore it is not possible to decide in PHP if the user is anonymous or has been successfully authenticated. How is this performed in SVN for the mixed authentication/anonymous access? What i do not want is: - check the credentials in PHP (due to the many different auth-methods which could be configured with Apache) - have a dummy anonymous user like guest with password guest - split anonymous and authenticated parts in separate folders (to use separate .htaccess-files) I hope to get some enlightenment from the way SVN realizes this feature. Any feedback is highly appreciated. Thank you Dirk
Re: Safe to add a field to svn_error_t?
On 24.12.2010 19:32, Blair Zajac wrote: On 12/24/10 6:07 AM, Hyrum K Wright wrote: On Fri, Dec 24, 2010 at 12:10 AM, Blair Zajacbl...@orcaware.com wrote: The addition of tracing to svn_error_t in 1.7 currently uses a special message string traced call. Instead of doing this, can we safely add an svn_boolean_t to svn_error_t the indicates if it is a traced error? While svn_error_t is public, I'm hoping most people are creating them with svn_error_create*(), so if that's the case, then code written and compiled against svn= 1.6 would work fine when linked against a 1.7 library. Thoughts? We don't explicitly say only use approved methods to create this object, but we are pretty implicit about it, with an entire documentation group for Error creation and destruction. I'm on the fence on this, but would lean toward go ahead and extend the struct if pressured. What problem are you trying to solve? There's this comment in svn_error__is_tracing_link(): svn_boolean_t svn_error__is_tracing_link(svn_error_t *err) { #ifdef SVN_ERR__TRACING /* ### A strcmp()? Really? I think it's the best we can do unless ### we add a boolean field to svn_error_t that's set only for ### these placeholder error chain items. Not such a bad idea, ### really... */ return (err err-message !strcmp(err-message, SVN_ERR__TRACED)); #else return FALSE; #endif } What is the use-case for this function anyways? IMHO, an svn_error_t says There was problem $message when executing the instruction at $file, $line. To be more precise it was inner error info follows. The fact that most error messages are just traced call simply means we did not bother to give more specific descriptions. But I don't see a real semantic difference because everybody along the calling chain may have contributed to the problem. At best one might use this function to reduce the length of the error report hoping that all the default text in between are not relevant to the particular scenario. I thought of two other solutions: 1) a sentinel message and we compare by address. 2) negative source file line numbers. To me, they seems at least as hacky as the current solution. I could easily live with them but so can I with the current code. -- Stefan^2.
Re: Merge tracking between several parallel branches
On Tue, Dec 28, 2010 at 01:08:47PM +0100, Fredrik Orderud wrote: This question has already been posted on the users list, but I did not receive any answer there. Hi Fredrik, The dev@ list is not an escalation path for unanswered questions from us...@. I'd suggest that you try posting to users@ again. Maybe it just happened that nobody had time to answer when you sent your original question. You can reply to your own users@ post to raise awareness of the fact that you're still looking for an answer. Good luck, Stefan
Re: [PATCH] fixes some path related deprecation warnings
Prabhu --- subversion/libsvn_repos/commit.c(revision 1053267) +++ subversion/libsvn_repos/commit.c(working copy) @@ -231,7 +231,7 @@ svn_node_kind_t kind; svn_revnum_t cr_rev; svn_repos_authz_access_t required = svn_authz_write; - const char *full_path = svn_path_join(eb-base_path, path, pool); + const char *full_path = svn_dirent_join(eb-base_path, path, pool); I think you need to use svn_fspath__join Same in other parts of this patch. Because eb-basepath is a path inside fs as per the below snipped comment. snip /* Location in fs where the edit will begin. */ const char *base_path; /snip With regards Kamesh Jayachandran
Re: Any idea why public function like svn_fspath__dirname have double __ in its name?
On 12/28/2010 04:30 AM, Kamesh Jayachandran wrote: Hi All, I could see the following functions in our public header with '__' in their names, svn_fspath__is_canonical svn_fspath__dirname svn_fspath__basename svn_fspath__split svn_fspath__join svn_fspath__is_child svn_fspath__skip_ancestor svn_fspath__is_ancestor svn_fspath__get_longest_ancestor svn_error__locate svn_error__malfunction IIUC __ convention is used for semi-public functions where we may not keep BC promise. Do I miss something? While we've mandated that __ must be used for semi-public functions, we've never said that it can't be used for private ones. One could argue that the use of __ for even private functions avoids a required rename if the function is made semi-public, and adds information. For example, I know that most of the functions you list above are FS-related and probably logically related. But I wouldn't suggest a universal policy change at this stage. -- C. Michael Pilato cmpil...@collab.net CollabNet www.collab.net Distributed Development On Demand signature.asc Description: OpenPGP digital signature
Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)
On Fri, Dec 24, 2010 at 3:40 PM, Stefan Fuhrmann eq...@web.de wrote: On 20.12.2010 02:43, Johan Corveleyn wrote: On Wed, Dec 15, 2010 at 10:58 AM, Stefan Fuhrmanneq...@web.de wrote: On 15.12.2010 02:30, Stefan Fuhrmann wrote: On 14.12.2010 23:35, Johan Corveleyn wrote: Thoughts? Two things you might try: * introduce a local variable for afile[i] * replace the for() loop with two nested ones, keeping calls to other functions out of the hot spot: for (i=0; i file_len;) That should read: for (i=0; i file_len; i++) { /* hot spot: */ for(; i file_len; i++) { curFile = afile[i]; if (curFile-chunk==-1) curFile-chunk = 0; else if (curFile-curp != curFile-endp -1) curFile-curp++; else break; } if (i file_len) { /* the complex, rarely used stuff goes here */ } } Ok, I tried this, but it didn't really help. It's still only about as fast as before. While the macro version is about 10% faster (I made a cleaner macro version, only macro'ing the hotspot stuff, with a function call to the complex, rarely used stuff). For the inline version, I tried several variations (always with APR_INLINE in the function signature, and with local variable for file[i]): with or without the nested for loop, with or without the complex stuff factored out to a separate function, ... it made no difference. Maybe I'm still doing something wrong, keeping the compiler from inlining it (but I'm not really a compiler expert, maybe I have to add some compiler options or something, ...). OTOH, I now have a macro implementation which is quite readable (and which uses the do ... while(0) construct - thanks Peter), and which does the trick. So maybe I should simply stick with that option ... The key factor here is that file_len is only 2. Basically, that will make my optimization a pessimization. Assuming that for most calls, curp of *both* files will be somewhere *between* BOF and EOF, the unrolling the loop may help: #define INCREMENT_POINTERS /* special code for the common case. 10 .. 12 ticks per execution */ static APR_INLINE svn_boolean_t quickly_increment_pointers(struct file_info *afile[]) { struct file_info *file0 = afile[0]; struct file_info *file1 = afile[1]; if ((afile0-chunk != -1) (afile1-chunk != -1)) { /* suitable_type */ nextp0 = afile0-curp + 1; /* suitable_type */ nextp1 = afile1-curp + 1; if ((nextp0 != afile0-endp) (nextp1 != afile1-endp)) { afile0-curp = nextp0; afile1-curp = nextp1; return TRUE; } } return FALSE; } /* slow code covering all special cases */ static svn_error_t* slowly_increment_pointers(struct file_info *file[], apr_size_t file_len, apr_pool_t *pool) { for (i = 0; i file_len;i++) ... } /* maybe as macro: */ return ((file_len == 2) quickly_increment_pointers (file)) ? SVN_NO_ERROR : slowly_increment_pointers(file, file_len, pool); Nice! I will try this out sometime soon, but I'm not so sure it will help more than the current macro version (only macro for the critical section, calling a function for increment/decrement chunk - see diff_file.c in HEAD of diff-optimizations-bytes branch). I guess I'll just have to test it. With the macro implementation in place my gut feeling is that the prefix/suffix scanning is reasonably optimal, and that the remaining time spent in the diff code (70 seconds for my example blame of big xml file with 2200 changes) is almost all due to the original diff algorithm (working on the remainder between prefix and suffix). But to be sure I should probably measure that first. (I should note here that, with my example file, the prefix/suffix scanning doesn't always work optimally, because in about half of the revisions there is a change both in one of the first lines (where an author name is inserted automatically by XML Spy upon saving it), and then somewhere in the middle where the actual change is done. It's really a pity, because that means about 2-3 lines cannot be skipped as prefix.) Minor remark on C style: static APR_INLINE svn_error_t * increment_pointers(struct file_info *file[], int file_len, apr_pool_t *pool) file_len should be apr_size_t unless it can assume negative values in which case it should be apr_ssize_t.That way, you know for sure it will never overflow. Int is potentially too short (although extrmly unlikely in this case) and it is signed (often not appropriate for an index value). Thanks. I didn't know that. I'll change it. One more thing that might help speed up your code. The fact that the calling overhead of a single simple function turns out to be a significant contribution to the overall runtime tells us that the execution is tightly CPU-bound and squeezing some extra cycles out of it could help. Try struct file_info file[] instead of struct file_info *file[] i.e. array of structs instead of array of pointers. The boring background
Re: Any idea why public function like svn_fspath__dirname have double __ in its name?
[C. Michael Pilato] svn_fspath__is_canonical svn_fspath__dirname svn_fspath__basename svn_fspath__split svn_fspath__join svn_fspath__is_child svn_fspath__skip_ancestor svn_fspath__is_ancestor svn_fspath__get_longest_ancestor svn_error__locate svn_error__malfunction While we've mandated that __ must be used for semi-public functions, we've never said that it can't be used for private ones. Kamesh is talking about public, not private, functions. I.e., ones where we actually do have an ABI promise to keep. I looked at svn_error__locate last week. It's really only useful in --enable-maintainer-mode, but the way it's implemented, it ends up being used by macros in the public (non-maintainer) ABI, so even if we eliminate those callers, we have to supply the function itself forever. I've addressed this as best we can, in r1053469. Supplying a given ABI forever is the sort of thing I thought we didn't have to do for __ functions. I think that's Kamesh's question too. -- Peter Samuelson | org-tld!p12n!peter | http://p12n.org/
Re: svn commit: r1053461 - /subversion/site/publish/doap.rdf
On Tue, Dec 28, 2010 at 5:00 PM, hwri...@apache.org wrote: Author: hwright Date: Tue Dec 28 22:00:37 2010 New Revision: 1053461 URL: http://svn.apache.org/viewvc?rev=1053461view=rev Log: First hack at a DOAP file for Subversion, as requested by the Powers That Be at the ASF. (But I think it's a Good Thing, too. ;) This was autogenerated by http://projects.apache.org/create.html * publish/doap.rdf: New. Added: subversion/site/publish/doap.rdf (with props) Added: subversion/site/publish/doap.rdf URL: http://svn.apache.org/viewvc/subversion/site/publish/doap.rdf?rev=1053461view=auto == --- subversion/site/publish/doap.rdf (added) +++ subversion/site/publish/doap.rdf Tue Dec 28 22:00:37 2010 @@ -0,0 +1,51 @@ +?xml version=1.0? +?xml-stylesheet type=text/xsl? +rdf:RDF xml:lang=en + xmlns=http://usefulinc.com/ns/doap#; + xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; + xmlns:asfext=http://projects.apache.org/ns/asfext#; + xmlns:foaf=http://xmlns.com/foaf/0.1/; +!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the License); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an AS IS BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +-- + Project rdf:about=http://subversion.apache.org/; + created2010-12-28/created + license rdf:resource=http://usefulinc.com/doap/licenses/asl20; / + nameApache Subversion/name + homepage rdf:resource=http://subversion.apache.org/; / + asfext:pmc rdf:resource=http://subversion.apache.org; / + shortdescEnterprise-class centralized version control for the masses/shortdesc + descriptionSubversion exists to be universally recognized and adopted as an open-source, centralized version control system characterized by its reliability as a safe haven for valuable data; the simplicity of its model and usage; and its ability to support the needs of a wide variety of users and projects, from individuals to large-scale enterprise operations./description + bug-database rdf:resource=http://subversion.tigris.org/issues/; / Couldn't we just use this URL for bug-database? http://subversion.apache.org/issue-tracker.html + mailing-list rdf:resource=http://subversion.apache.org/mailing-lists.html; / + download-page rdf:resource=http://subversion.apache.org/source-code.html; / + programming-languageC/programming-language + category rdf:resource=http://projects.apache.org/category/build-management; / They do not have version control or SCM as a category to use? I do not think of Build Management as category for Subversion. -- Thanks Mark Phippard http://markphip.blogspot.com/
[Be|Na]gging for review
Hi devs, I would really like the core improvement of my work on SVN performance to become part of 1.7: the in-process fulltext cache. While the risk is low (we already offer something similar based on memcached), the caching code is so fundamental to most other pending changes, that I want to hear at least a second opinion: ^/subversion/branches/integrate-cache-membuffer Take this as an opportunity for a last nightly deed (Ny!) in 2010 or a new year's resolution. -- Stefan^2.
Re: FSFS format 6 (was: Re: FSv2)
On Sun, Dec 12, 2010 at 4:23 PM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de wrote: On 19.10.2010 15:10, Daniel Shahaf wrote: Greg Stein wrote on Tue, Oct 19, 2010 at 04:31:42 -0400: Personally, I see [FSv2] as a broad swath of API changes to align our needs with the underlying storage. Trowbridge noted that our current API makes it *really* difficult to implement an effective backend. I'd also like to see a backend that allows for parallel PUTs during the commit process. Hyrum sees FSv2 as some kind of super-key-value storage with layers on top, allowing for various types of high-scaling mechanisms. At the retreat, stefan2 also had some thoughts about this... [This is just a brain-dump for 1.8+] While working on the performance branch I made some observations concerning the way FSFS organizes data and how that could be changed for reduced I/O overhead. notes/fsfs-improvements.txt contains a summary of that could be done to improve FSFS before FS-NG. A later FS-NG implementation should then still benefit from the improvements. +(number of fopen calls during a log operation) I like this proposal a lot. As I already told before, we are running our FSFS back-end on a SAN over NFS (and I suspect we're not the only company doing this). In this environment, the server-side I/O of SVN (especially the amount of random reads and fopen calls during e.g. log) is often the major bottleneck. There is one question going around in my head though: won't you have to change/rearrange a lot of the FS layer code (and maybe repos layer?) to benefit from this new format? The current code is written in a certain way, not particularly optimized for this new format (I seem to remember log does around 10 fopen calls for every interesting rev file, each time reading a different part of it). Also, if an operation currently needs to access many revisions (like log or blame), it doesn't take advantage at all of the fact that they might be in a single packed rev file. The pack file is opened and seeked in just as much as the sum of the individual rev files. So: how will the current code be able to take advantage of this new format? Won't this require a major effort to restructure that code? (This reminds me of the current difficulty (as I can see it, as an innocent bystander) with the WC-NG rewrite: theoretically it should be very fast, but the higher level code is still largely based upon the old principles. So to take advantage of it, certain things have to be changed at the higher level, making operations work dir-based or tree-based, instead of file-based etc). Cheers, -- Johan
Re: svn commit: r1053461 - /subversion/site/publish/doap.rdf
On Tue, Dec 28, 2010 at 5:36 PM, Mark Phippard markp...@gmail.com wrote: On Tue, Dec 28, 2010 at 5:00 PM, hwri...@apache.org wrote: Author: hwright Date: Tue Dec 28 22:00:37 2010 New Revision: 1053461 URL: http://svn.apache.org/viewvc?rev=1053461view=rev Log: First hack at a DOAP file for Subversion, as requested by the Powers That Be at the ASF. (But I think it's a Good Thing, too. ;) This was autogenerated by http://projects.apache.org/create.html * publish/doap.rdf: New. Added: subversion/site/publish/doap.rdf (with props) Added: subversion/site/publish/doap.rdf URL: http://svn.apache.org/viewvc/subversion/site/publish/doap.rdf?rev=1053461view=auto == --- subversion/site/publish/doap.rdf (added) +++ subversion/site/publish/doap.rdf Tue Dec 28 22:00:37 2010 @@ -0,0 +1,51 @@ +?xml version=1.0? +?xml-stylesheet type=text/xsl? +rdf:RDF xml:lang=en + xmlns=http://usefulinc.com/ns/doap#; + xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; + xmlns:asfext=http://projects.apache.org/ns/asfext#; + xmlns:foaf=http://xmlns.com/foaf/0.1/; +!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the License); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an AS IS BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +-- + Project rdf:about=http://subversion.apache.org/; + created2010-12-28/created + license rdf:resource=http://usefulinc.com/doap/licenses/asl20; / + nameApache Subversion/name + homepage rdf:resource=http://subversion.apache.org/; / + asfext:pmc rdf:resource=http://subversion.apache.org; / + shortdescEnterprise-class centralized version control for the masses/shortdesc + descriptionSubversion exists to be universally recognized and adopted as an open-source, centralized version control system characterized by its reliability as a safe haven for valuable data; the simplicity of its model and usage; and its ability to support the needs of a wide variety of users and projects, from individuals to large-scale enterprise operations./description + bug-database rdf:resource=http://subversion.tigris.org/issues/; / Couldn't we just use this URL for bug-database? http://subversion.apache.org/issue-tracker.html Sure could. :) (I'll probably get around to that at some point, but if anybody else wants to beat me to it, feel free.) + mailing-list rdf:resource=http://subversion.apache.org/mailing-lists.html; / + download-page rdf:resource=http://subversion.apache.org/source-code.html; / + programming-languageC/programming-language + category rdf:resource=http://projects.apache.org/category/build-management; / They do not have version control or SCM as a category to use? I do not think of Build Management as category for Subversion. From http://projects.apache.org/categories.html: build-managementProjects related to building/maintaining source code/websites. That seems to fit the Subversion bill pretty well. But if we'd like, we can include additional categories as well. -Hyrum
Re: svn commit: r1053461 - /subversion/site/publish/doap.rdf
On Tue, Dec 28, 2010 at 8:13 PM, Hyrum K Wright hy...@hyrumwright.org wrote: + category rdf:resource=http://projects.apache.org/category/build-management; / They do not have version control or SCM as a category to use? I do not think of Build Management as category for Subversion. From http://projects.apache.org/categories.html: build-management Projects related to building/maintaining source code/websites. That seems to fit the Subversion bill pretty well. But if we'd like, we can include additional categories as well. Based on the list of categories, I can see why you chose that one, but I think we need to create a new category. Subversion is not a build management tool and it does not make sense to lump it in with Maven, Ant, Ivy etc. I think version control is probably the best category, but if the goal seems to be to find ways to group Apache projects together then maybe we should add SCM? There are other Apache projects like Maven that could fit in that category. We could also add httpd-module as one of our categories. Technically content and library would also probably be valid, but these are really all secondary categories IMO. -- Thanks Mark Phippard http://markphip.blogspot.com/
Re: [Be|Na]gging for review
On 29.12.2010 01:33, Stefan Fuhrmann wrote: Hi devs, I would really like the core improvement of my work on SVN performance to become part of 1.7: the in-process fulltext cache. While the risk is low (we already offer something similar based on memcached), the caching code is so fundamental to most other pending changes, that I want to hear at least a second opinion: ^/subversion/branches/integrate-cache-membuffer Take this as an opportunity for a last nightly deed *k*nightly (which stresses even more that _nightly_ is exactly what we probably don't want) (Ny!) in 2010 or a new year's resolution. -- Stefan^2.
Re: [PATCH] Correct documentation for svn_repos_node_editor and friends
Erik Johansson wrote on Wed, Dec 22, 2010 at 22:00:22 +0100: On Sat, Dec 18, 2010 at 20:43, Daniel Shahaf d...@daniel.shahaf.name wrote: Erik Johansson wrote on Tue, Dec 14, 2010 at 19:19:17 +0100: [[[ r846201 added svn_repos_replay as a replacement not so much for svn_repos_dir_delta, but for at least a whole class of functionality that svn_repos_dir_delta was never intended to handle. One place where svn_repos_replay replaced svn_repos_dir_delta was in combination with svn_repos_node_editor (only used in svnlook). This commit updates the documentation for svn_repos_node_editor and friends to reflect this. * subversion/include/svn_repos.h (svn_repos_node_editor, svn_repos_node_from_baton): Doc update ]]] The patch basically does s/svn_repos_dir_delta2/svn_repos_replay2/g. However, despite the log message, I still don't understand why that is a correct change. According to the log message for r846201 it seems like svn_repos_replay is faster than svn_repos_dir_delta for some operations. I can only assume that this is one such case. IOW, the patch recommends svn_repos_replay2() because that is newer API. Why are we mentioning a specific driver at all? Can svn_repos_node_editor() be driven only by svn_repos_replay2()? (I don't see why that would be the case.) If not, then its doc string should avoid mention that the returned editor can be driven only by %s, when in fact it can be driven by anything. I don't know the anything about the other drivers, but my assumption is that any driver works, but svn_repos_replay is the prefered one in this case. *nod* We have three mechanisms: svn_repos_dir_delta2(), svn_repos_begin_report2(), and svn_repos_replay2(). I think it would be useful to just mention all of them --- after all, none of them is deprecated and all of them can represent a tree delta using an editor. Mention all on equal footing or say that svn_repos_replay2 is the preferred one? I can fix the docs, but I don't know enough about svn api to actually determine the correct fix. E.g. do all the drivers call methods in the editor in the same order? In theory, yes. That's the core of the editor API: the driver only knows the svn_delta_editor_t vtable, and the drivee also knows only the vtable --- they need not (in general) know each other. For example, 'svnrdump dump' drives a commit editor; but nothing in the commit editor knows what dumpfiles are. // Erik -- Erik Johansson Home Page: http://ejohansson.se/ PGP Key: http://ejohansson.se/erik.asc
Re: [PATCH] support printing changed properties in svnlook
Erik Johansson wrote on Wed, Dec 22, 2010 at 23:00:43 +0100: On Fri, Dec 10, 2010 at 14:29, Daniel Shahaf d...@daniel.shahaf.name wrote: + /** How this property entered the node tree: 'A'dd, 'D'elete, 'R'eplace */ + char action; This is copied from svn_repos_node_t-action. There was recently a question about that field: http://mid.gmane.org/3abd28aa-a2fc-4d7d-a502-479d37995...@orcaware.com So, that asks whether 'C'hanged is a valid answer to the question that -action is meant to answer. I'll also ask how this interacts with node changes: for example; if r5 replaces-with-history a node that has 'fooprop' set with another node that also has 'fooprop' set, what would the value of 'action' be? What about this: When a node is deleted all the properties it had are listed in mod_prop with action D. When a node is added-with-history all the properties the source had are listed in mod_prop with action A and a new flag copyfrom = TRUE. A replace-with-history will result in two repos_nodes, each having a mod_prop list. If the same property exists in both it means it has been replaced. (will reply later) + /** The name of the property */ + const char *name; Where is the value of the property? How to get it? The idea was that the struct should indicate changes to properties, not their values. In the same way that svn_repos_node_t shows changes, not node content. Flawed analogy: we never store node content in memory, but we do have all property values in memory, so the cost is just to add an svn_string_t * member to the struct. My question was how would callers get the value if they cared about it. i.e., I assume that whoever calls this function already has a property hash (or, at least, an fs object) available that they can get the property values from? I agree that it is simple to add the value to the struct, but in svnlook (which is the only current in-tree user of this api) I can't see any need to get the value. Yep; but we're designing an API, so I was trying to get it right (ie: general) the first time. IMO, if we're tweaking the API and we can have it put the value there without too much effort, we should do that. (But since svnlook doesn't need that, I'm okay with punting on providing the value in the API if it's more than some threshold effort.) How a caller would get the property value if they indeed cared about it is outside my knowledge of the api. Perhaps svn_repos_node_editor() is not the editor to use if you want the property value? To be honest I don't recall the context exactly, but I imagine that whoever called the API knows whose properties list it was just given, so it could just use another API to get the values if needed. Over RA there are sometimes calling order restrictions, but I don't recall that we have such issues at the repos layer (which is lower). + /** Pointer to the next sibling property */ + struct svn_repos_node_prop_t *sibling; + You use a linked list. How about using apr_array_header_t *? Or a hash of (prop_name - struct)? I guess anyone of those would work, but the reason I went for a linked list was that svn_repos_node_t did that and I wanted them to be similar. Hmm. I haven't seen that struct before: svn_repos_node_t indeed uses an explicit tree structure. What do you think will be more useful to consumers of the API? A hash allows both random access and iteration --- do APR arrays or linked lists have advantages that outweigh that? A hash seemed like overkill, but if you think it is better I have no problem using that. I'm used to seeing 'apr_hash_t *props'. I think an APR array (apr_array_header_t) would be better than an explicit linked list, because we can use library functions rather than rewrite them. Would an apr_hash_t be better still? Possibly, but I'm not feeling strongly on that. (It's not needed for svnlook's use case, and it might be a bit costlier.) // Erik -- Erik Johansson Home Page: http://ejohansson.se/ PGP Key: http://ejohansson.se/erik.asc
Re: [PATCH] support printing changed properties in svnlook
[ The second hunk below has a discussion regarding the representation in the API of 'svn ps k v iota; svn ci iota; svn cp iota iota2; svn ps k2 v2 iota2; svn ci iota2', which I'd be happy to have further input on. ] Erik Johansson wrote on Wed, Dec 22, 2010 at 23:08:30 +0100: On Fri, Dec 10, 2010 at 15:08, Daniel Shahaf d...@daniel.shahaf.name wrote: [ I'm somewhat confused about this issue. ] Erik Johansson wrote on Thu, Dec 09, 2010 at 21:41:38 +0100: On Wed, Dec 8, 2010 at 23:22, Daniel Shahaf d...@daniel.shahaf.name wrote: Erik Johansson wrote on Wed, Dec 08, 2010 at 17:17:47 +0100: + /** How this property entered the node tree: 'A'dd, 'D'elete, 'R'eplace */ + char action; I should have caught this earlier, but replaced doesn't make sense for properties. (A node can be replaced by another node at the same path, but for a fixed node properties can be added/modified/removed but not replaced.) I'll change it to 'A'dd, 'D'elete or 'U'pdate then? 'M'odify, not 'U'pdate, I think. At least that's what the UI level does, but if there's precedent for 'U'pdated then follow it. This is copied from svn_repos_node_t-action. There was recently a question about that field: http://mid.gmane.org/3abd28aa-a2fc-4d7d-a502-479d37995...@orcaware.com So, that asks whether 'C'hanged is a valid answer to the question that -action is meant to answer. I'll also ask how this interacts with node changes: for example; if r5 replaces-with-history a node that has 'fooprop' set with another node that also has 'fooprop' set, what would the value of 'action' be? What about this: When a node is deleted all the properties it had are listed in mod_prop with action D. When a node is added-with-history all the properties the source had are listed in mod_prop with action A and a new flag copyfrom = TRUE. There is already a copyfrom indication in svn_repos_node_t-copyfrom_rev, I don't think we need another one. I was thinking that the copyfrom flag would be useful in the case where you copy a node with a propery and add another property before committing. Here you would have two property with action A and one would have copyfrom = TRUE. In other words, you'd like to make a distinction between This node was added-with-history, and this property came along and ditto, but the property is new on this copy target and didn't exist in the copy source. I see, and I agree it's a good distinction to make. I'm not sure how/whether our other API's (the editor, possibly the reporter, others?) represent this. A replace-with-history will result in two repos_nodes, each having a mod_prop list. If the same property exists in both it means it has been replaced. Two nodes doesn't sound like a good idea; one node marked as 'replaced' should suffice. In the current api a replaced node is represented by two nodes. Should we change this? Doesn't that mean that API callers who want to distinguish a replace from a remove or an add have to walk through all the child entries (for a given dir) before they can process any adds or removals? I'm thinking that one entry is better, because callers can split it if they wish; that's easier than two entries and letting callers figure out for themselves how to recombine them. Should svnlook stay back-wards compatible and print a replaced node twice, once as Deleted and once as Added? I could argue both ways: it should be back-compatible with itself, but it should also be compatible with the cmdline client where the latter prints just one R notification. In case of a replacement, do we want the API to provide the replaced node's properties? (at least their names) I suppose we can agree to answer this as Yes, then? :) It would be pretty simple if there are two separate nodes for a replaced node. The deleted node will have all the properties it had marked as Deleted. The new node will have its properties marked as appropiate. That's a straightforward representation, yes. I suppose we could find a straightforward in the single node case, though --- eg by adding fields only used in the case of 'R'eplacements. // Erik -- Erik Johansson Home Page: http://ejohansson.se/ PGP Key: http://ejohansson.se/erik.asc
Re: [Be|Na]gging for review
On 12/28/2010 6:28 PM, Stefan Fuhrmann wrote: On 29.12.2010 01:33, Stefan Fuhrmann wrote: Hi devs, I would really like the core improvement of my work on SVN performance to become part of 1.7: the in-process fulltext cache. While the risk is low (we already offer something similar based on memcached), the caching code is so fundamental to most other pending changes, that I want to hear at least a second opinion: ^/subversion/branches/integrate-cache-membuffer Take this as an opportunity for a last nightly deed *k*nightly (which stresses even more that _nightly_ is exactly what we probably don't want) (Ny!) in 2010 or a new year's resolution. I'll take a look at it when I have a chance, but my first question is, does it work in a long running (months) persistent multithreaded process that LRU caches svn_fs_t, svn_repos_t and other objects for long periods of time? I have a custom svn server that exposes the svn_fs.h API through an Ice layer (see www.zeroc.com). Thanks, Blair