FSv2 (was: FREE Apache Subversion Meetup...)
On Mon, Oct 18, 2010 at 23:51, Blair Zajac bl...@orcaware.com wrote: On 10/04/2010 06:45 AM, C. Michael Pilato wrote: There, you can learn more about what the Meetups tend to look like, what other Meetups are planned for this years conference, and so on. You'll also find a link to the Subversion Meetup wiki page: http://subversion.open.collab.net/wiki/ApacheConNA2010Meetup That's the first mention I've seen of FSv2. What ideas are going into it? What problems is it primarily meant to solve? FSv2 is a hand-wave. Personally, I see it as a broad swath of API changes to align our needs with the underlying storage. Trowbridge noted that our current API makes it *really* difficult to implement an effective backend. I'd also like to see a backend that allows for parallel PUTs during the commit process. Hyrum sees FSv2 as some kind of super-key-value storage with layers on top, allowing for various types of high-scaling mechanisms. Whatever it is... it is a placeholder for a new round of innovation on our repository. It is whatever the community would like to see from a new backend. We haven't *ever* really worked through a discussion of a backend. The BDB just evolved, from before the community was really built up. The FSFS backend was dropped in as a fait d'accompli. And thereafter, it was small evolutions for the backends. Cheers, -g
RE: checksum error
No clues? -Original Message- From: Edward Ned Harvey [mailto:s...@nedharvey.com] Sent: Saturday, October 16, 2010 8:19 AM To: dev@subversion.apache.org Subject: checksum error I have a master slave server, in US and India. They are both 1.6.12, but the slave was 1.5.7 until a few days ago. The mast is at rev 5050, but the slave will only sync up to rev 5045. Every time it tries to sync 5046, I get a checksum error in the apache error_log. It says some file fails md5. In order to figure out where the corruption is, I tried using a svn client, to simply checkout that file from the master, rev 5044, 5045, 5046. Had no problem. Correct me if I'm wrong, but, if the corruption were on the server, I should have gotten the error during that operation, right? Does the svn sync slave do md5 checksums that the svn client doesn't do? If I restore the slave from backup, and re-try the sync, it consistently fails at 5045 trying 5046, and always the same file, and always the same mismatched checksum. Since we're drawing the conclusion that the corruption is in the slave, it must mean that a past version of something was corrupted and undetected. In fact, when I upgraded from 1.5.7 to 1.6.12, I did a dump load. So it seems, if I had silently corrupted on-disk, I should have discovered the problem at that time, right? Unless perhaps ... 1.5 didn't do checksums so the 1.6 dump load has nothing from the past to verify... Does 1.6 do some new md5 checksumming that 1.5 didn't do? Anyway, I'm transferring a new fresh copy of the repo to the slave. I figure this should fix it. But I'm extremely curious how this all happened.
Re: FSv2 (was: FREE Apache Subversion Meetup...)
Greg Stein wrote on Tue, Oct 19, 2010 at 04:31:42 -0400: Personally, I see [FSv2] as a broad swath of API changes to align our needs with the underlying storage. Trowbridge noted that our current API makes it *really* difficult to implement an effective backend. I'd also like to see a backend that allows for parallel PUTs during the commit process. Hyrum sees FSv2 as some kind of super-key-value storage with layers on top, allowing for various types of high-scaling mechanisms. At the retreat, stefan2 also had some thoughts about this...
Fwd: board: r25274 - /foundation/board/board_agenda_2010_10_20.txt
There is a Board meeting tomorrow. A couple weeks ago, I forwarded my report to the Board for y'all to see here. One of the feedback items was that we may want to consider how we present the prior development lines' licensing on our source code page. Where did we document our release/support policy about 1.5.x for security fixes only, 1.6.x is current, and we're working on 1.7.0? Maybe we could have a section on the bottom of source-code.html that describes this, and we could also document the licensing and not an ASF release nature? Just brainstorming here... I think it is a reasonable request, so... ideas? Cheers, -g -- Forwarded message -- From: curc...@apache.org Date: Tue, Oct 19, 2010 at 09:00 Subject: board: r25274 - /foundation/board/board_agenda_2010_10_20.txt ... @@ -574,6 +578,9 @@ the Board was aware. maintenance releases of development-lines from pre-ASF introduction is a unique concern. + sc: thanks! Note that we should probably call this fact + out more clearly on http://subversion.apache.org/source-code.html + to ensure users understand who's releasing the product. ] AK. Apache Harmony Project [Tim Ellison / Noirin]
Upgrading partially relocated working copies
In 1.7 we insist that switch --relocate only acts on a whole working copy. In 1.6 we allow parts of the working copy to be relocated. What should 1.7 do when upgrading a partially switched working copy? At present it ignores partial relocation and upgrades the whole working copy based on the repository root used in the working copy root. Should it detect the partial relocation and throw an error? -- Philip
Re: FSv2 (was: FREE Apache Subversion Meetup...)
On Tue, 2010-10-19 at 04:31 -0400, Greg Stein wrote: The FSFS backend was dropped in as a fait d'accompli. A minor correction: ra_svn was dropped in as a fait d'accompli. FSFS was, as far as I remember, a pretty open process where I created a design and Josh Pieper implemented it. You can look at the commit history of libsvn_fs_fs to see that, and I'm pretty sure that Josh and I were working over the expected open channels (dev list and IRC) at the time.
Re: FSv2 (was: FREE Apache Subversion Meetup...)
On Tue, Oct 19, 2010 at 11:04, Greg Hudson ghud...@mit.edu wrote: On Tue, 2010-10-19 at 04:31 -0400, Greg Stein wrote: The FSFS backend was dropped in as a fait d'accompli. A minor correction: ra_svn was dropped in as a fait d'accompli. FSFS was, as far as I remember, a pretty open process where I created a design and Josh Pieper implemented it. You can look at the commit history of libsvn_fs_fs to see that, and I'm pretty sure that Josh and I were working over the expected open channels (dev list and IRC) at the time. Oh! well, I'll assume your memory is better than mine. My apologies for the comment. Thx for the clarification, -g
Re: Fwd: board: r25274 - /foundation/board/board_agenda_2010_10_20.txt
On 10/19/2010 10:29 AM, Greg Stein wrote: Maybe we could have a section on the bottom of source-code.html that describes this, and we could also document the licensing and not an ASF release nature? Just brainstorming here... +1. -- C. Michael Pilato cmpil...@collab.net CollabNet www.collab.net Distributed Development On Demand signature.asc Description: OpenPGP digital signature
massive memory leak
Hi, I'm still using r1023755 from trunk, but I haven't seen a commit which I think would fix this: There's a massive memory leak somewhere. I can't check out even small projects anymore since the memory consumption raises very fast and reaches the limit of my available RAM (6GB) after about the first 100 files. After that, the system isn't usable anymore because the constant swapping. I then tried an update of the interrupted checkout (had to hard reset the system because it wasn't reacting anymore), but the update too uses up all memory in a few seconds. Switching to neon instead of the default serf fixes the problem. So this is limited to either serf itself or ra_serf. I'm using serf 0.7.0 Stefan -- ___ oo // \\ De Chelonian Mobile (_,\/ \_/ \ TortoiseSVN \ \_/_\_/The coolest Interface to (Sub)Version Control /_/ \_\ http://tortoisesvn.net
Re: massive memory leak
Stefan Küng tortoise...@gmail.com writes: I'm still using r1023755 from trunk, but I haven't seen a commit which I think would fix this: There's a massive memory leak somewhere. I can't check out even small projects anymore since the memory consumption raises very fast and reaches the limit of my available RAM (6GB) after about the first 100 files. After that, the system isn't usable anymore because the constant swapping. I then tried an update of the interrupted checkout (had to hard reset the system because it wasn't reacting anymore), but the update too uses up all memory in a few seconds. Switching to neon instead of the default serf fixes the problem. So this is limited to either serf itself or ra_serf. I'm using serf 0.7.0 On my Linux machine I can checkout Subversion trunk using serf and the memory used by the process (as reported by top) is about 150MB. Neon uses slightly less memory, about 130MB. I'm using Subversion r1024287, serf 0.7.0, and apr 1.2.12. Does the problem occur with one big file? A hundred small files in one directory? A hundred nested directories with no files? With http-compression? Over https? -- Philip
Re: massive memory leak
On 10/19/2010 12:41 PM, Stefan Küng wrote: I tried to check out npp: https://notepad-plus.svn.sourceforge.net/svnroot/notepad-plus/trunk Nothing special about file sizes or number of files. Can you try it without SSL? (Is that possible?) I seem to recall Paul Burba looking into and solving an SSL-only massive ra_serf memory leak recently. -- C. Michael Pilato cmpil...@collab.net CollabNet www.collab.net Distributed Development On Demand signature.asc Description: OpenPGP digital signature
Re: massive memory leak
On 19.10.2010 18:46, C. Michael Pilato wrote: On 10/19/2010 12:41 PM, Stefan Küng wrote: I tried to check out npp: https://notepad-plus.svn.sourceforge.net/svnroot/notepad-plus/trunk Nothing special about file sizes or number of files. Can you try it without SSL? (Is that possible?) I seem to recall Paul Burba looking into and solving an SSL-only massive ra_serf memory leak recently. Is there a public server without ssl available? I only have ssl-enabled test servers set up at home. Stefan -- ___ oo // \\ De Chelonian Mobile (_,\/ \_/ \ TortoiseSVN \ \_/_\_/The coolest Interface to (Sub)Version Control /_/ \_\ http://tortoisesvn.net
Re: massive memory leak
On 10/19/2010 12:53 PM, Stefan Küng wrote: On 19.10.2010 18:46, C. Michael Pilato wrote: On 10/19/2010 12:41 PM, Stefan Küng wrote: I tried to check out npp: https://notepad-plus.svn.sourceforge.net/svnroot/notepad-plus/trunk Nothing special about file sizes or number of files. Can you try it without SSL? (Is that possible?) I seem to recall Paul Burba looking into and solving an SSL-only massive ra_serf memory leak recently. Is there a public server without ssl available? I only have ssl-enabled test servers set up at home. You can compare checkouts of https://svn.apache.org/repos/asf/subversion/trunk and http://svn.apache.org/repos/asf/subversion/trunk -- C. Michael Pilato cmpil...@collab.net CollabNet www.collab.net Distributed Development On Demand signature.asc Description: OpenPGP digital signature
Re: massive memory leak
On 19.10.2010 18:56, C. Michael Pilato wrote: On 10/19/2010 12:53 PM, Stefan Küng wrote: On 19.10.2010 18:46, C. Michael Pilato wrote: On 10/19/2010 12:41 PM, Stefan Küng wrote: I tried to check out npp: https://notepad-plus.svn.sourceforge.net/svnroot/notepad-plus/trunk Nothing special about file sizes or number of files. Can you try it without SSL? (Is that possible?) I seem to recall Paul Burba looking into and solving an SSL-only massive ra_serf memory leak recently. Is there a public server without ssl available? I only have ssl-enabled test servers set up at home. You can compare checkouts of https://svn.apache.org/repos/asf/subversion/trunk and http://svn.apache.org/repos/asf/subversion/trunk Ups, completely forgot that this was also accessible via http, not just https :) Testing without ssl shows no memory leak at all. So this is related to ssl. Stefan -- ___ oo // \\ De Chelonian Mobile (_,\/ \_/ \ TortoiseSVN \ \_/_\_/The coolest Interface to (Sub)Version Control /_/ \_\ http://tortoisesvn.net
Re: massive memory leak
C. Michael Pilato cmpil...@collab.net writes: On 10/19/2010 12:41 PM, Stefan Küng wrote: I tried to check out npp: https://notepad-plus.svn.sourceforge.net/svnroot/notepad-plus/trunk Nothing special about file sizes or number of files. Can you try it without SSL? (Is that possible?) I seem to recall Paul Burba looking into and solving an SSL-only massive ra_serf memory leak recently. Using that SSL URL the memory is leaked on my Linux box using serf. When I replace https with http the checkout starts and appears not to leak memory, but it fails randomly, e.g. like this: ../src/subversion/libsvn_wc/update_editor.c:4189: (apr_err=200014) svn: Checksum mismatch for '/home/pm/sw/subversion/obj/x1/scintilla/cocoa/ScintillaView.mm': expected: 10fdb67ead5e76cc5f9ac1a147144511 actual: 37ac596ba413089673fe32c6986c079f or like this: Ax1/scintilla/cocoa/ScintillaView.h ../src/subversion/svn/checkout-cmd.c:172: (apr_err=20014) ../src/subversion/libsvn_client/checkout.c:246: (apr_err=20014) ../src/subversion/libsvn_client/update.c:295: (apr_err=20014) ../src/subversion/libsvn_client/update.c:295: (apr_err=20014) ../src/subversion/libsvn_client/update.c:218: (apr_err=20014) ../src/subversion/libsvn_ra_serf/update.c:2332: (apr_err=20014) svn: Error retrieving REPORT (20014): Internal error or like this: Ax1/scintilla/zipsrc.bat ../src/subversion/svn/checkout-cmd.c:172: (apr_err=175009) ../src/subversion/libsvn_client/checkout.c:246: (apr_err=175009) ../src/subversion/libsvn_client/update.c:295: (apr_err=175009) ../src/subversion/libsvn_client/update.c:295: (apr_err=175009) ../src/subversion/libsvn_client/update.c:218: (apr_err=175009) ../src/subversion/libsvn_ra_serf/update.c:2329: (apr_err=175009) ../src/subversion/libsvn_ra_serf/util.c:1591: (apr_err=175009) ../src/subversion/libsvn_ra_serf/util.c:1591: (apr_err=175009) ../src/subversion/libsvn_ra_serf/util.c:1298: (apr_err=175009) ../src/subversion/libsvn_ra_serf/util.c:1298: (apr_err=175009) svn: XML parsing failed: (207 Multi-Status) -- Philip
Re: massive memory leak
Philip Martin philip.mar...@wandisco.com writes: Using that SSL URL the memory is leaked on my Linux box using serf. When I replace https with http the checkout starts and appears not to leak memory, but it fails randomly, e.g. like this: Checkouts work using neon for http and https. -- Philip
Re: FSv2 (was: FREE Apache Subversion Meetup...)
On Tue, Oct 19, 2010 at 12:12 PM, Blair Zajac bl...@orcaware.com wrote: ... 3) Pools are painful to use. We have repository, revision and transaction C++ objects stored in an LRU cache. They cache revision and transaction roots for improved performance. Using the wrong pool for a RPC method can cause memory leaks (we just found one Monday causing a backend server to run out of memory). Constructing and destroying pools in the wrong order can cause the process to crash. This is hard to get right, so using a different model would be very useful. I haven't had the cycles to look at Hyrum's new C++ object and see how that would help. While I'd appreciate the extra eyeballs, don't get your hopes up. The objects are still being managed and stored in pools internally, which causes all kinds of overheads. It's probably not nearly as performant as your looking for (though it does hide the complexity of Pools from C++ consumers, which is a big part of the design goals.) -Hyrum
Re: massive memory leak
On Tue, Oct 19, 2010 at 6:58 PM, Stefan Küng tortoise...@gmail.com wrote: On 19.10.2010 18:56, C. Michael Pilato wrote: On 10/19/2010 12:53 PM, Stefan Küng wrote: On 19.10.2010 18:46, C. Michael Pilato wrote: On 10/19/2010 12:41 PM, Stefan Küng wrote: I tried to check out npp: https://notepad-plus.svn.sourceforge.net/svnroot/notepad-plus/trunk Nothing special about file sizes or number of files. Can you try it without SSL? (Is that possible?) I seem to recall Paul Burba looking into and solving an SSL-only massive ra_serf memory leak recently. Is there a public server without ssl available? I only have ssl-enabled test servers set up at home. You can compare checkouts of https://svn.apache.org/repos/asf/subversion/trunk and http://svn.apache.org/repos/asf/subversion/trunk Ups, completely forgot that this was also accessible via http, not just https :) Testing without ssl shows no memory leak at all. So this is related to ssl. This has been fixed in serf trunk r1408 for a while, but hasn't shown up in a serf patch release yet. Lieven
Re: massive memory leak
On 19.10.2010 20:13, Lieven Govaerts wrote: This has been fixed in serf trunk r1408 for a while, but hasn't shown up in a serf patch release yet. Sorry, I should have checked the serf commits first. Thanks for the update on this. Stefan -- ___ oo // \\ De Chelonian Mobile (_,\/ \_/ \ TortoiseSVN \ \_/_\_/The coolest Interface to (Sub)Version Control /_/ \_\ http://tortoisesvn.net
Re: massive memory leak
On Tue, Oct 19, 2010 at 8:14 PM, Stefan Küng tortoise...@gmail.com wrote: On 19.10.2010 20:13, Lieven Govaerts wrote: This has been fixed in serf trunk r1408 for a while, but hasn't shown up in a serf patch release yet. Sorry, I should have checked the serf commits first. Thanks for the update on this. Also see: http://subversion.tigris.org/issues/show_bug.cgi?id=3684 - ra_serf has a memory usage problem Where Paul Burba describes exactly this problem (memory leak with serf 0.7.0 over SSL, no leak with http), and confirms that s...@1408 fixes it. Cheers, -- Johan
Re: massive memory leak
On Tue, Oct 19, 2010 at 1:13 PM, Lieven Govaerts svn...@mobsol.be wrote: On Tue, Oct 19, 2010 at 6:58 PM, Stefan Küng tortoise...@gmail.com wrote: On 19.10.2010 18:56, C. Michael Pilato wrote: On 10/19/2010 12:53 PM, Stefan Küng wrote: On 19.10.2010 18:46, C. Michael Pilato wrote: On 10/19/2010 12:41 PM, Stefan Küng wrote: I tried to check out npp: https://notepad-plus.svn.sourceforge.net/svnroot/notepad-plus/trunk Nothing special about file sizes or number of files. Can you try it without SSL? (Is that possible?) I seem to recall Paul Burba looking into and solving an SSL-only massive ra_serf memory leak recently. Is there a public server without ssl available? I only have ssl-enabled test servers set up at home. You can compare checkouts of https://svn.apache.org/repos/asf/subversion/trunk and http://svn.apache.org/repos/asf/subversion/trunk Ups, completely forgot that this was also accessible via http, not just https :) Testing without ssl shows no memory leak at all. So this is related to ssl. This has been fixed in serf trunk r1408 for a while, but hasn't shown up in a serf patch release yet. Any word on when that might happen? -Hyrum
diff4: is it actually used?
Hi devs, In the context of the diff optimization patch I'm working on ([1]), I'm wondering if diff4 is actually used in svn. If I look for usages of subversion/libsvn_diff/diff4.c#svn_diff_diff4, I only come up with tools/diff/diff4.c#main. So: this code isn't used in the svn core itself? What's the use of the diff4 executable in tools/diff? Is it for experimentation with that implementation, but it never really got finalized to be taken up in svn proper? Or am I overlooking something? Also, tools/diff/diff4.c mentions the following description above the copyright notice: diff4-test.c -- test driver for 3-way text merges. Wrong filename. And test driver, what does that mean? Background: I'm working on extending my patch to work for all diff's in svn: normal diff, diff3 and diff4 (to avoid having to special case things only for regular diff, and the patch is just as useful for diffX as for regular diff). I don't want to break anything, so I was looking around for ways how to exercise that code. Should I take diff4 into account, and if so, how can I test it? Cheers, -- Johan [1] http://svn.haxx.se/dev/archive-2010-10/0167.shtml
Re: FSv2 (was: FREE Apache Subversion Meetup...)
On Tue, Oct 19, 2010 at 13:12, Blair Zajac bl...@orcaware.com wrote: On 10/19/2010 01:31 AM, Greg Stein wrote: ... Personally, I see it as a broad swath of API changes to align our needs with the underlying storage. Trowbridge noted that our current API makes it *really* difficult to implement an effective backend. I'd also like to see a backend that allows for parallel PUTs during the commit process. Hyrum sees FSv2 as some kind of super-key-value storage with layers on top, allowing for various types of high-scaling mechanisms. How would that API look? The API as it is is pretty clear. Editor v2. There are a lot more constraints around modification sequences, atomicity, and whatnot. The current, unconstrained API was a major issue for Trow (e.g allowing a delete, an add, then a delete again; multiple propset changes; stuff like that). Switching it to Ev2 would also let us plug the FS more directly into how other portions of SVN will make edits to a tree. ... 2) I would like to ensure that the new backend supports multiple modifications to the same node. I don't know if this was designed into the current backend, but given I expose svn_fs.h over RPC, clients can make any one or multiple modifications to the tree, so the new backend should support this. Why? How do you actually use this feature? These multiple edits are part of the problem for certain backend designs. ... out of memory). Constructing and destroying pools in the wrong order can cause the process to crash. This is hard to get right, so using a different model would be very useful. We may be able to use pocore's pools and cleanups, which are more flexible. pocore will be used by serf (and possibly ra_serf), so it will typically be around the system during an svn build/run. Cheers, -g
Re: FSv2 (was: FREE Apache Subversion Meetup...)
On 10/19/2010 02:33 PM, Greg Stein wrote: On Tue, Oct 19, 2010 at 13:12, Blair Zajacbl...@orcaware.com wrote: On 10/19/2010 01:31 AM, Greg Stein wrote: ... Personally, I see it as a broad swath of API changes to align our needs with the underlying storage. Trowbridge noted that our current API makes it *really* difficult to implement an effective backend. I'd also like to see a backend that allows for parallel PUTs during the commit process. Hyrum sees FSv2 as some kind of super-key-value storage with layers on top, allowing for various types of high-scaling mechanisms. How would that API look? The API as it is is pretty clear. Editor v2. There are a lot more constraints around modification sequences, atomicity, and whatnot. The current, unconstrained API was a major issue for Trow (e.g allowing a delete, an add, then a delete again; multiple propset changes; stuff like that). Yeah, I wouldn't want those additional constraints. Switching it to Ev2 would also let us plug the FS more directly into how other portions of SVN will make edits to a tree. ... 2) I would like to ensure that the new backend supports multiple modifications to the same node. I don't know if this was designed into the current backend, but given I expose svn_fs.h over RPC, clients can make any one or multiple modifications to the tree, so the new backend should support this. Why? How do you actually use this feature? These multiple edits are part of the problem for certain backend designs. It's not me, but clients of the API I expose, that allows any arbitrary modification to a filesystem root. I like to think of the svn backend as a versioned filesystem without any particular limitations on how it's used, which I think is really useful. Not too many people use this, but svn has basically the only versioned filesystem with a good API, stable ABI, good performance. In previous incarnations of the versioned asset management system, users didn't have to think about what modifications they could or couldn't make and in what order, so in the current implementation, they can do the same thing being backed by svn, and get versioning provided by svn. Putting this constraint on a newer backend wouldn't work for me, unless I batch up operations from the client and apply them in one editor. If we went with a key/value store, then it seems (without thinking about it too much) that allowing multiple edits would be that hard, as it could just replace an existing key/value with updated data? out of memory). Constructing and destroying pools in the wrong order can cause the process to crash. This is hard to get right, so using a different model would be very useful. We may be able to use pocore's pools and cleanups, which are more flexible. pocore will be used by serf (and possibly ra_serf), so it will typically be around the system during an svn build/run. Cool, thanks. I'll have to check that out. Does it provide a compatibility layer, in that you could pass pocore allocated memory to svn functions? Blair
Re: checksum error
On Sat, Oct 16, 2010 at 8:18 AM, Edward Ned Harvey s...@nedharvey.com wrote: I have a master slave server, in US and India. They are both 1.6.12, but the slave was 1.5.7 until a few days ago. The mast is at rev 5050, but the slave will only sync up to rev 5045. Every time it tries to sync 5046, I get a checksum error in the apache error_log. It says some file fails md5. In order to figure out where the corruption is, I tried using a svn client, to simply checkout that file from the master, rev 5044, 5045, 5046. Had no problem. Correct me if I'm wrong, but, if the corruption were on the server, I should have gotten the error during that operation, right? Does the svn sync slave do md5 checksums that the svn client doesn't do? If I restore the slave from backup, and re-try the sync, it consistently fails at 5045 trying 5046, and always the same file, and always the same mismatched checksum. Did you try running 'svnadmin verify' on either repository? Since we're drawing the conclusion that the corruption is in the slave, it must mean that a past version of something was corrupted and undetected. In fact, when I upgraded from 1.5.7 to 1.6.12, I did a dump load. So it seems, if I had silently corrupted on-disk, I should have discovered the problem at that time, right? Unless perhaps ... 1.5 didn't do checksums so the 1.6 dump load has nothing from the past to verify... Oh, this actually exercises some of the same code... yes, I would have expected dump to fail. Perhaps it was an on-the-wire bug. I seem to recall an instance where svnsync could fail in this way, but don't know the details off the top of my head. -John
Re: [WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster
On Tue, Oct 12, 2010 at 12:10 PM, Julian Foad julian.f...@wandisco.com wrote: On Tue, 2010-10-12 at 00:31 +0200, Johan Corveleyn wrote: On Mon, Oct 11, 2010 at 11:53 AM, Julian Foad julian.f...@wandisco.com wrote: On Sat, 2010-10-09, Johan Corveleyn wrote: On Sat, Oct 9, 2010 at 2:57 AM, Julian Foad julian.f...@wandisco.com wrote: So I wrote a patch - attached - that refactors this into an array of 4 sub-structures, and simplifies all the code that uses them. [...] Yes, great idea! That would indeed vastly simplify a lot of the code. So please go ahead and commit the refactoring. OK, committed in r1021282. Thanks, looks much more manageable now. I'd like to see a simplified version of your last patch, taking advantage of that, before you go exploring other options. Ok, here's a new version of the patch, taking advantage of your file_info refactoring. This vastly simplifies the code, so that it might actually be understandable now :-). Other things I've done in this version: 1) Generalized everything to handle an array of datasources/files, instead of just two. This makes it slightly more complex here and there (using for loops everywhere), but I think it's ok, and it's also more consistent/generic. If anyone has better ideas to do those for loops, suggestions welcome. This makes the algorithm usable by diff3 as well (and diff4 if needed (?)). I have not yet enabled it for diff3, because I haven't yet understood how it handles the generation of its diff output (needs to take into account the prefix_lines. I tried some quick hacks, but lots of tests were failing, so I'll have to look more into it - that's for a follow up patch). When I can enable it for diff3 (and diff4), I can remove datasource_open (with one datasource). 2) Removed get_prefix_lines from svn_diff_fns_t (and its implementations in diff_file.c and diff_memory.c). Instead I pass prefix_lines directly to token.c#svn_diff__get_tokens. 3) If prefix scanning ended in the last chunk, the suffix scanning now reuses that buffer which already contains the last chunk. As a special case, this also avoids reading the file twice if it's smaller than 128 Kb. 4) Added doc strings everywhere. Feel free to edit those, I'm new at documenting things in svn. Still TODO: - revv svn_diff_fns_t and maybe other stuff I've changed in public API. - See if implementing the critical parts of increment_pointers and decrement_pointers in a macro improves performance. - Add support for -x-b, -x-w, and -x--ignore-eol-style options. For this (and for other reasons), I'd still like to investigate pushing this optimization into the token parsing/handling layer, to extract entire tokens etc., even if this means the current patch has to be thrown away. I'll take this up in a separate thread. Log message: [[[ Make svn_diff skip identical prefix and suffix to make diff and blame faster. * subversion/include/svn_diff.h (svn_diff_fns_t): Added new function type datasources_open to the vtable. * subversion/libsvn_diff/diff_memory.c (datasources_open): New function (does nothing). (svn_diff__mem_vtable): Added new function datasources_open. * subversion/libsvn_diff/diff_file.c (svn_diff__file_baton_t): Added member prefix_lines, and inside the struct file_info the members suffix_start_chunk and suffix_offset_in_chunk. (increment_pointers, decrement_pointers): New functions. (is_one_at_bof, is_one_at_eof): New functions. (find_identical_prefix, find_identical_suffix): New functions. (datasources_open): New function, to open multiple datasources and find their identical prefix and suffix, so these can be excluded from the rest of the diff algorithm, as a performance optimization. From the identical suffix, 50 lines are kept to help the diff algorithm find the nicest possible diff representation in case of ambiguity. (datasource_get_next_token): Stop at start of identical suffix. (svn_diff__file_vtable): Added new function datasources_open. * subversion/libsvn_diff/diff.h (svn_diff__get_tokens): Added argument datasource_opened, to indicate that the datasource was already opened, and argument prefix_lines, the number of identical prefix lines.and use this as the starting offset for the token we're getting. * subversion/libsvn_diff/token.c (svn_diff__get_tokens): Added arguments datasource_opened and prefix_lines. Only open the datasource if datasource_opened is FALSE. Set the starting offset of the position list to the number of prefix_lines. * subversion/libsvn_diff/lcs.c (svn_diff__lcs): Added argument prefix_lines. Use this to correctly set the offset of the sentinel position for EOF, even if one of the files became empty after eliminating the identical prefix. * subversion/libsvn_diff/diff.c (svn_diff__diff): Add a chunk of common diff for identical prefix. (svn_diff_diff): Use new function datasources_open to open original and modified at once and find their
Re: svn commit: r1024416 - /subversion/site/publish/index.html
I think that we want reg; instead, since we actually have a *registered* trademark, unlike all other Apache projects. On Tue, Oct 19, 2010 at 17:21, hwri...@apache.org wrote: Author: hwright Date: Tue Oct 19 21:21:33 2010 New Revision: 1024416 URL: http://svn.apache.org/viewvc?rev=1024416view=rev Log: Add a trademark symbol near the first use of Apache Subversion on the main page, in accordance with the ASF guidelines: http://www.apache.org/foundation/marks/pmcs * publish/index.html: (site-content): Add a 'tm' next to the first use of the Apache Subversion name. Modified: subversion/site/publish/index.html Modified: subversion/site/publish/index.html URL: http://svn.apache.org/viewvc/subversion/site/publish/index.html?rev=1024416r1=1024415r2=1024416view=diff == --- subversion/site/publish/index.html (original) +++ subversion/site/publish/index.html Tue Oct 19 21:21:33 2010 @@ -21,8 +21,8 @@ Enterprise-class centralized version control for the masses/p pWelcome to strongsubversion.apache.org/strong, the online home - of the Apache Subversion software project. Subversion is an open source - version control system. Founded in 2000 by CollabNet, Inc., the + of the Apache Subversiontrade; software project. Subversion is an open + source version control system. Founded in 2000 by CollabNet, Inc., the Subversion project and software have seen incredible success over the past decade. Subversion has enjoyed and continues to enjoy widespread adoption in both the open source arena and the corporate
Re: svn commit: r1024442 - in /subversion/site/publish: index.html site-nav.html
I'm not sure that we need a Foundation link in the site navigation menu. That is well-covered in the new prose you added. Also, I'd like to remove the www.apache.org link from the site navigation. We have a link in the upper-right, at the bottom-left, and in the main text of the landing page. Those locations are more than enough for the guidelines. The idea here is that we can shrink that left-nav to its minimum. Thoughts? Cheers, -g On Tue, Oct 19, 2010 at 18:37, danie...@apache.org wrote: Author: danielsh Date: Tue Oct 19 22:37:46 2010 New Revision: 1024442 URL: http://svn.apache.org/viewvc?rev=1024442view=rev Log: Tweak the About the ASF navbar entry. * /site/publish/site-nav.html (About the ASF): Add acronym/ tag and Foundation link. * /site/publish/index.html (site-overview): Add site-overview-asf section. Modified: subversion/site/publish/index.html subversion/site/publish/site-nav.html Modified: subversion/site/publish/index.html URL: http://svn.apache.org/viewvc/subversion/site/publish/index.html?rev=1024442r1=1024441r2=1024442view=diff == --- subversion/site/publish/index.html (original) +++ subversion/site/publish/index.html Tue Oct 19 22:37:46 2010 @@ -205,6 +205,26 @@ /div !-- #site-overview-community -- +div class=h3 id=site-overview-asf +h3The About the ASF Section + a class=sectionlink href=#site-overview-asf + title=Link to this sectionpara;/a +/h3 + +!-- see http://www.apache.org/foundation/marks/pmcs -- +pThe rest of this site is about Subversion mdash; but Subversion doesn't + operate in vaccum. It is part of the a href=http://www.apache.org;Apache + Software Foundation (ASF)/a, which mdash; in addition to the servers that + run this site and our mailing lists mdash; provides financial, technical, + and legal backing. The About the ASF section contains links that relate + to the a href=http://www.apache.org/foundation/;the Foundation/a as + a whole. It lists our + a href=http://www.apache.org/foundation/thanks.html;sponsors/a and + allows you to a href=http://www.apache.org/foundation/sponsorship.html; + donate/a if you wish./p + +/div !-- #site-overview-asf -- + /div !-- #site-overview -- /div !-- #site-content -- Modified: subversion/site/publish/site-nav.html URL: http://svn.apache.org/viewvc/subversion/site/publish/site-nav.html?rev=1024442r1=1024441r2=1024442view=diff == --- subversion/site/publish/site-nav.html (original) +++ subversion/site/publish/site-nav.html Tue Oct 19 22:37:46 2010 @@ -25,11 +25,13 @@ lia href=/contributing.htmlGetting Involved/a/li /ul /li - liAbout the ASF + liAbout the acronym title=Apache Software FoundationASF/acronym ul lia target=_blank class=linkaway href=http://www.apache.org/;Apache.org/a/li lia target=_blank class=linkaway + href=http://www.apache.org/foundation/;Foundation/a/li + lia target=_blank class=linkaway href=http://www.apache.org/licenses/;Licenses/a/li lia target=_blank class=linkaway href=http://www.apache.org/foundation/sponsorship.html;Donate/a/li
Diff optimization: implement prefix/suffix-skipping in token-handling code (was: Re: [WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster)
On Tue, Oct 12, 2010 at 12:35 PM, Julian Foad julian.f...@wandisco.com wrote: On Sun, 2010-10-10 at 23:43 +0200, Johan Corveleyn wrote: On Sat, Oct 9, 2010 at 2:21 PM, Johan Corveleyn jcor...@gmail.com wrote: On Sat, Oct 9, 2010 at 2:57 AM, Julian Foad julian.f...@wandisco.com wrote: But this makes me think, it looks to me like this whole prefix-suffix-skipping functionality would fit better inside the lower-level diff algorithm rather than inside the datasources_open function. Do you agree? It works as it is, but you were talking about wanting it to obey the standard token-parsing rules such as ignore white space, and so on. It seems that things like this would be much better one level down. Yes, I've been struggling with this. But I can't easily see it fit in the lower levels right now. Problem is that everything in those lower levels always acts on 1 datasource at a time (getting all the tokens, inserting them into the token tree, ... then the same for the next datasource). The datasource_open seemed to me to be the easiest place to combine datasources to do things for both of them concurrently (with least impact on the rest of the code). Maybe those lower-level functions could also be made multi-datasource, but I have to think a little bit more about that. I've thought a little bit more about this, going through the code, and yeah, it might be a good idea to push the prefix/suffix scanning into the lower-level parts of the diff-algorithm (the token parsing / building the token tree). Something like: - Make token.c#svn_diff__get_tokens take multiple datasources. - In diff.c, diff3.c and diff4.c replace the multiple calls to this function to one call passing multiple datasources. - token.c#svn_diff__get_tokens (with multiple datasources) will take care of identical prefix and suffix scanning based on tokens (so can take advantage of the standard token-parsing rules with ignore-* options, by simply calling svn_diff_fns_t.datasource_get_next_token). One of the improvements we're looking for is to make use of the already existing diff options - ignore-white-space etc. We can get that improvement with a much smaller change: simply by calling the 'get_next_token' routine, or rather a part of it, from within your current 'find_identical_prefix' function, without touching any of the generic diffN.c/token.c code and APIs. Some things needed to support this: - svn_diff_fns_t.datasource_get_next_token: calculation of the hash should be made conditional (I don't want to waste time for the adler32 hash for lines that are not going to be put in the token tree). Yes. If you take this smaller change approach, then the way to do this would be to factor out the non-adler part of 'datasource_get_next_token' into a separate private function, and call that. - For suffix scanning, I'll need another variation of datasource_get_next_token to get tokens from the end going backwards (say datasource_get_previous_token). No hash needed. Yes. Does that make sense? Or am I missing it completely? I can't really comment on the question of moving this right down into the diffN.c/token.c code, as I don't have time to study that right now. The possible benefits I can see are: * Sharing this feature across all kinds of diff implementations (currently: file and in-memory-string). Good from a practical POV if and only if really long strings are being compared in memory. Good from a code-structural POV. Yes, that seems like a nice improvement, code-structurally. I don't know if it will have noticeable performance impact. If I understand the code correctly, the in-memory-diff code is used for diffing properties. Some properties can be quite large (e.g. svn:mergeinfo), but I don't know if they are large enough to have an impact (unless some high-level operations perform a million property diffs or something). * I'm curious about whether it would be possible to integrate prefix skipping into the main algorithm in such a way that the algorithm would see the skipped prefix as a possible match, just like any other chunk (including its adler32), but in a much quicker way than the algorithm currently finds such a prefix. If so, it would be able to handle better the cases where one file has added a prefix that is a duplicate of existing text. Same for the suffix. Maybe that's possible, but I don't think it will help much. For one thing, it introduces some overhead (adler32 calculation of the prefix). And it will only help if the added piece of text is *exactly* the same as the prefix (every line of it). If the added piece of text misses the last line of the prefix, it will not match (different adler32 hash). If you need to be able to match at different spots inside the prefix, you'll have to hash (and compare) every line separately, which voids the benefit of this optimization. However, I've had another idea for optimization, which I
Re: svn commit: r1024442 - in /subversion/site/publish: index.html site-nav.html
On Tue, Oct 19, 2010 at 6:28 PM, Greg Stein gst...@gmail.com wrote: I'm not sure that we need a Foundation link in the site navigation menu. That is well-covered in the new prose you added. Also, I'd like to remove the www.apache.org link from the site navigation. We have a link in the upper-right, at the bottom-left, and in the main text of the landing page. Those locations are more than enough for the guidelines. The idea here is that we can shrink that left-nav to its minimum. Agreed. The paragraph of prose on the landing page meets the guidelines, so let's not clutter the rest of the nav bar. -Hyrum Thoughts? Cheers, -g On Tue, Oct 19, 2010 at 18:37, danie...@apache.org wrote: Author: danielsh Date: Tue Oct 19 22:37:46 2010 New Revision: 1024442 URL: http://svn.apache.org/viewvc?rev=1024442view=rev Log: Tweak the About the ASF navbar entry. * /site/publish/site-nav.html (About the ASF): Add acronym/ tag and Foundation link. * /site/publish/index.html (site-overview): Add site-overview-asf section. Modified: subversion/site/publish/index.html subversion/site/publish/site-nav.html Modified: subversion/site/publish/index.html URL: http://svn.apache.org/viewvc/subversion/site/publish/index.html?rev=1024442r1=1024441r2=1024442view=diff == --- subversion/site/publish/index.html (original) +++ subversion/site/publish/index.html Tue Oct 19 22:37:46 2010 @@ -205,6 +205,26 @@ /div !-- #site-overview-community -- +div class=h3 id=site-overview-asf +h3The About the ASF Section + a class=sectionlink href=#site-overview-asf + title=Link to this sectionpara;/a +/h3 + +!-- see http://www.apache.org/foundation/marks/pmcs -- +pThe rest of this site is about Subversion mdash; but Subversion doesn't + operate in vaccum. It is part of the a href=http://www.apache.org;Apache + Software Foundation (ASF)/a, which mdash; in addition to the servers that + run this site and our mailing lists mdash; provides financial, technical, + and legal backing. The About the ASF section contains links that relate + to the a href=http://www.apache.org/foundation/;the Foundation/a as + a whole. It lists our + a href=http://www.apache.org/foundation/thanks.html;sponsors/a and + allows you to a href=http://www.apache.org/foundation/sponsorship.html; + donate/a if you wish./p + +/div !-- #site-overview-asf -- + /div !-- #site-overview -- /div !-- #site-content -- Modified: subversion/site/publish/site-nav.html URL: http://svn.apache.org/viewvc/subversion/site/publish/site-nav.html?rev=1024442r1=1024441r2=1024442view=diff == --- subversion/site/publish/site-nav.html (original) +++ subversion/site/publish/site-nav.html Tue Oct 19 22:37:46 2010 @@ -25,11 +25,13 @@ lia href=/contributing.htmlGetting Involved/a/li /ul /li - liAbout the ASF + liAbout the acronym title=Apache Software FoundationASF/acronym ul lia target=_blank class=linkaway href=http://www.apache.org/;Apache.org/a/li lia target=_blank class=linkaway + href=http://www.apache.org/foundation/;Foundation/a/li + lia target=_blank class=linkaway href=http://www.apache.org/licenses/;Licenses/a/li lia target=_blank class=linkaway href=http://www.apache.org/foundation/sponsorship.html;Donate/a/li
Re: svn commit: r1024416 - /subversion/site/publish/index.html
I know that Subversion is registered, but I'm unsure as to whether Apache Subversion is, so I just went with the guidelines as stated. If somebody wants to clear this up with trademarks@, go for it. -Hyrum On Tue, Oct 19, 2010 at 6:20 PM, Greg Stein gst...@gmail.com wrote: I think that we want reg; instead, since we actually have a *registered* trademark, unlike all other Apache projects. On Tue, Oct 19, 2010 at 17:21, hwri...@apache.org wrote: Author: hwright Date: Tue Oct 19 21:21:33 2010 New Revision: 1024416 URL: http://svn.apache.org/viewvc?rev=1024416view=rev Log: Add a trademark symbol near the first use of Apache Subversion on the main page, in accordance with the ASF guidelines: http://www.apache.org/foundation/marks/pmcs * publish/index.html: (site-content): Add a 'tm' next to the first use of the Apache Subversion name. Modified: subversion/site/publish/index.html Modified: subversion/site/publish/index.html URL: http://svn.apache.org/viewvc/subversion/site/publish/index.html?rev=1024416r1=1024415r2=1024416view=diff == --- subversion/site/publish/index.html (original) +++ subversion/site/publish/index.html Tue Oct 19 21:21:33 2010 @@ -21,8 +21,8 @@ Enterprise-class centralized version control for the masses/p pWelcome to strongsubversion.apache.org/strong, the online home - of the Apache Subversion software project. Subversion is an open source - version control system. Founded in 2000 by CollabNet, Inc., the + of the Apache Subversiontrade; software project. Subversion is an open + source version control system. Founded in 2000 by CollabNet, Inc., the Subversion project and software have seen incredible success over the past decade. Subversion has enjoyed and continues to enjoy widespread adoption in both the open source arena and the corporate
Re: svn commit: r1024480 - /subversion/site/publish/site-nav.html
On 10/19/2010 09:22 PM, cmpil...@apache.org wrote: Author: cmpilato Date: Wed Oct 20 01:22:55 2010 New Revision: 1024480 URL: http://svn.apache.org/viewvc?rev=1024480view=rev Log: * site/publish/site-nav.html Lose target=_blank bits from some links -- those aren't valid in XHTML 1.1. For a possible alternative approach, see the attached patch. It only affects index.html, but all the other site pages would need to be tweaked likewise for stuff to work. Disclaimer: JavaScript ain't my cup o' tea. Index: script/site.js === --- script/site.js (revision 0) +++ script/site.js (working copy) @@ -0,0 +1,40 @@ +/* site.js --- miscellaneous JavaScript utilities. + * + * + *Licensed to the Apache Software Foundation (ASF) under one + *or more contributor license agreements. See the NOTICE file + *distributed with this work for additional information + *regarding copyright ownership. The ASF licenses this file + *to you under the Apache License, Version 2.0 (the + *License); you may not use this file except in compliance + *with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + *Unless required by applicable law or agreed to in writing, + *software distributed under the License is distributed on an + *AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + *KIND, either express or implied. See the License for the + *specific language governing permissions and limitations + *under the License. + * + */ + +/* Install an onClick() handler for a tags with the linkaway + * class which opens the link in a new window. + */ +function addLinkawayEvents() { + // Make sure we have the necessary routines. + if (! document.getElementsByTagName) +return; + + var anchors = document.getElementsByTagName('a'); + for (var i = 0; i anchors.length; i++) { +var anchor = anchors[i]; +if (anchor.className.search(/\blinkaway\b/) != -1) { + if (! anchor.onclick ) { +anchor.onclick = function(anchor) { window.open(this.href); return false; }; + } +} + } +} Property changes on: script/site.js ___ Added: svn:mime-type ## -0,0 +1 ## +application/javascript Index: index.html === --- index.html (revision 1024477) +++ index.html (working copy) @@ -4,12 +4,13 @@ head titleApache Subversion/title meta http-equiv=Content-Type content=text/html;charset=utf-8 / +script src=/script/site.js/script style type=text/css @import url(/style/site.css); /style /head -body +body onload=addLinkawayEvents(); return false; !--#include virtual=/site-banner.html -- !--#include virtual=/site-nav.html -- div id=site-content signature.asc Description: OpenPGP digital signature
Re: FSv2 (was: FREE Apache Subversion Meetup...)
On Tue, 2010-10-19 at 04:31 -0400, Greg Stein wrote: The FSFS backend was dropped in as a fait d'accompli. [Greg Hudson] A minor correction: ra_svn was dropped in as a fait d'accompli. Another minor correction, or perhaps a minor minor-correction correction: fair accompli (literally finished work) has no d'. (:
Re: FSv2 (was: FREE Apache Subversion Meetup...)
[Peter Samuelson] Another minor correction, or perhaps a minor minor-correction correction: fair accompli (literally finished work) has no d'. _fait_ accompli! Of course when I get pedantic I misspell. http://linuxmafia.com/~rick/lexicon.html#moenslaw-corrections
Re: svn commit: r1024416 - /subversion/site/publish/index.html
Apache Subversion is not registered. It *is* a trademark of the Foundation. Only Subversion is registered. I'll get a clarification... On Tue, Oct 19, 2010 at 20:37, Hyrum K. Wright hyrum_wri...@mail.utexas.edu wrote: I know that Subversion is registered, but I'm unsure as to whether Apache Subversion is, so I just went with the guidelines as stated. If somebody wants to clear this up with trademarks@, go for it. -Hyrum On Tue, Oct 19, 2010 at 6:20 PM, Greg Stein gst...@gmail.com wrote: I think that we want reg; instead, since we actually have a *registered* trademark, unlike all other Apache projects. On Tue, Oct 19, 2010 at 17:21, hwri...@apache.org wrote: Author: hwright Date: Tue Oct 19 21:21:33 2010 New Revision: 1024416 URL: http://svn.apache.org/viewvc?rev=1024416view=rev Log: Add a trademark symbol near the first use of Apache Subversion on the main page, in accordance with the ASF guidelines: http://www.apache.org/foundation/marks/pmcs * publish/index.html: (site-content): Add a 'tm' next to the first use of the Apache Subversion name. Modified: subversion/site/publish/index.html Modified: subversion/site/publish/index.html URL: http://svn.apache.org/viewvc/subversion/site/publish/index.html?rev=1024416r1=1024415r2=1024416view=diff == --- subversion/site/publish/index.html (original) +++ subversion/site/publish/index.html Tue Oct 19 21:21:33 2010 @@ -21,8 +21,8 @@ Enterprise-class centralized version control for the masses/p pWelcome to strongsubversion.apache.org/strong, the online home - of the Apache Subversion software project. Subversion is an open source - version control system. Founded in 2000 by CollabNet, Inc., the + of the Apache Subversiontrade; software project. Subversion is an open + source version control system. Founded in 2000 by CollabNet, Inc., the Subversion project and software have seen incredible success over the past decade. Subversion has enjoyed and continues to enjoy widespread adoption in both the open source arena and the corporate