Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
Branko Čibej wrote on Mon, Dec 10, 2012 at 00:26:20 +0100: On 10.12.2012 00:08, Johan Corveleyn wrote: On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. Thanks. And your pre-commit hook example is much appreciated. For the moment I get the impression that it's not really doable / desirable to implement this in the repository. At least until now no-one has suggested how it could be done, and I don't know enough myself about the server-side / back-end to figure it out :-). The first obstacle is that the server does not interpret properties.[1] Therefore, you'd have to implement this check at transaction-commit time, because there's no earlier point where you're guaranteed to have all node properties at their final values. This implies that the time to reject a commit would be proportional to the size of the commit (which isn't the case when it comes to conflict detection). Can't you do what stefan2 suggested, but in libsvn_repos? The repos layer commit editor would remember for each file whether its textdelta has added a new 0x0D byte, then at close_edit() it would iterate all files in the commit --- and for each file only compare its svn:eol-style property to the by-now-precomputed did it it contain a 0x0D bit. I'm not sure how efficient it would be to parse for 0x0D's in libsvn_repos, though. Maybe we should make this optional. All properties are interpreted by clients. A buggy client will cause cause problems for other users, so the best course of action is to report the bug (to SvnKit in this case?). Also note that you're using a much too strong term when you talk about corrupted files in this case. There's nothing corrupted about them. An invariant failed to maintain. Granted it's not an FS_level invariant, though. -- Brane [1] Not strictly true since mod_dav_svn will look at svn:mime-type when serving the content at its default, unversioned URL; but it doesn't actually interpret the value. -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On 10.12.2012 11:31, Daniel Shahaf wrote: Branko Čibej wrote on Mon, Dec 10, 2012 at 00:26:20 +0100: On 10.12.2012 00:08, Johan Corveleyn wrote: On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. Thanks. And your pre-commit hook example is much appreciated. For the moment I get the impression that it's not really doable / desirable to implement this in the repository. At least until now no-one has suggested how it could be done, and I don't know enough myself about the server-side / back-end to figure it out :-). The first obstacle is that the server does not interpret properties.[1] Therefore, you'd have to implement this check at transaction-commit time, because there's no earlier point where you're guaranteed to have all node properties at their final values. This implies that the time to reject a commit would be proportional to the size of the commit (which isn't the case when it comes to conflict detection). Can't you do what stefan2 suggested, but in libsvn_repos? The repos layer commit editor would remember for each file whether its textdelta has added a new 0x0D byte, then at close_edit() it would iterate all files in the commit --- and for each file only compare its svn:eol-style property to the by-now-precomputed did it it contain a 0x0D bit. I'm not sure how efficient it would be to parse for 0x0D's in libsvn_repos, though. Maybe we should make this optional. It's not enough to just look for \r in the delta stream. Certainly wouldn't help with historically broken files. Moreover, if libsvn_repos started looking at svn:eol-style, that'd only make sense if you verified all normalizations, not just native. And why should it just be svn:eol-style, when svn:keywords has potentially the same problem? Eventually you start verifying everything that affects file contents. Maybe that's a good idea, but /why/ put it in libsvn_repos when it's much easier and cleaner to just provide some standard hooks, written in C and distributed with releases, that the admin plug in if she feels like it? Surely that's what hooks are for. -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On Mon, Dec 10, 2012 at 2:51 PM, Branko Čibej br...@wandisco.com wrote: On 10.12.2012 11:31, Daniel Shahaf wrote: Branko Čibej wrote on Mon, Dec 10, 2012 at 00:26:20 +0100: On 10.12.2012 00:08, Johan Corveleyn wrote: On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. Thanks. And your pre-commit hook example is much appreciated. For the moment I get the impression that it's not really doable / desirable to implement this in the repository. At least until now no-one has suggested how it could be done, and I don't know enough myself about the server-side / back-end to figure it out :-). The first obstacle is that the server does not interpret properties.[1] Therefore, you'd have to implement this check at transaction-commit time, because there's no earlier point where you're guaranteed to have all node properties at their final values. This implies that the time to reject a commit would be proportional to the size of the commit (which isn't the case when it comes to conflict detection). Can't you do what stefan2 suggested, but in libsvn_repos? The repos layer commit editor would remember for each file whether its textdelta has added a new 0x0D byte, then at close_edit() it would iterate all files in the commit --- and for each file only compare its svn:eol-style property to the by-now-precomputed did it it contain a 0x0D bit. I'm not sure how efficient it would be to parse for 0x0D's in libsvn_repos, though. Maybe we should make this optional. It's not enough to just look for \r in the delta stream. Certainly wouldn't help with historically broken files. Moreover, if libsvn_repos started looking at svn:eol-style, that'd only make sense if you verified all normalizations, not just native. And why should it just be svn:eol-style, when svn:keywords has potentially the same problem? Eventually you start verifying everything that affects file contents. Maybe that's a good idea, but /why/ put it in libsvn_repos when it's much easier and cleaner to just provide some standard hooks, written in C and distributed with releases, that the admin plug in if she feels like it? Surely that's what hooks are for. New program 'svnhooks' with set of hooks for standard recommended polices would be great. svn:eol-style check is good start. -- Ivan Zhakov
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
Branko Čibej wrote on Mon, Dec 10, 2012 at 11:51:08 +0100: On 10.12.2012 11:31, Daniel Shahaf wrote: Branko Čibej wrote on Mon, Dec 10, 2012 at 00:26:20 +0100: On 10.12.2012 00:08, Johan Corveleyn wrote: On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. Thanks. And your pre-commit hook example is much appreciated. For the moment I get the impression that it's not really doable / desirable to implement this in the repository. At least until now no-one has suggested how it could be done, and I don't know enough myself about the server-side / back-end to figure it out :-). The first obstacle is that the server does not interpret properties.[1] Therefore, you'd have to implement this check at transaction-commit time, because there's no earlier point where you're guaranteed to have all node properties at their final values. This implies that the time to reject a commit would be proportional to the size of the commit (which isn't the case when it comes to conflict detection). Can't you do what stefan2 suggested, but in libsvn_repos? The repos layer commit editor would remember for each file whether its textdelta has added a new 0x0D byte, then at close_edit() it would iterate all files in the commit --- and for each file only compare its svn:eol-style property to the by-now-precomputed did it it contain a 0x0D bit. I'm not sure how efficient it would be to parse for 0x0D's in libsvn_repos, though. Maybe we should make this optional. It's not enough to just look for \r in the delta stream. Certainly wouldn't help with historically broken files. I wasn't trying to fix historically broken files. (That would require a fulltext scan --- that's not a cheap computation.) Do you see another problem? Maybe that's a good idea, but /why/ put it in libsvn_repos when it's much easier and cleaner to just provide some standard hooks, written in C and distributed with releases, that the admin plug in if she feels like it? Surely that's what hooks are for. The argument is that a Subversion server should be enforcing Subversion's invariants. That said, I'm not opposed to doing it via standard hooks. It's a good way to introduce the feature in a way that allows more easily changing it before it hits the APIs and their strict compatibility rules. -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On Mon, Dec 10, 2012 at 12:47 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Branko Čibej wrote on Mon, Dec 10, 2012 at 11:51:08 +0100: On 10.12.2012 11:31, Daniel Shahaf wrote: Branko Čibej wrote on Mon, Dec 10, 2012 at 00:26:20 +0100: On 10.12.2012 00:08, Johan Corveleyn wrote: On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. Thanks. And your pre-commit hook example is much appreciated. For the moment I get the impression that it's not really doable / desirable to implement this in the repository. At least until now no-one has suggested how it could be done, and I don't know enough myself about the server-side / back-end to figure it out :-). The first obstacle is that the server does not interpret properties.[1] Therefore, you'd have to implement this check at transaction-commit time, because there's no earlier point where you're guaranteed to have all node properties at their final values. This implies that the time to reject a commit would be proportional to the size of the commit (which isn't the case when it comes to conflict detection). Can't you do what stefan2 suggested, but in libsvn_repos? The repos layer commit editor would remember for each file whether its textdelta has added a new 0x0D byte, then at close_edit() it would iterate all files in the commit --- and for each file only compare its svn:eol-style property to the by-now-precomputed did it it contain a 0x0D bit. I'm not sure how efficient it would be to parse for 0x0D's in libsvn_repos, though. Maybe we should make this optional. It's not enough to just look for \r in the delta stream. Certainly wouldn't help with historically broken files. I wasn't trying to fix historically broken files. (That would require a fulltext scan --- that's not a cheap computation.) Do you see another problem? Indeed, historically broken files are not the focus here. They're broken in the history of the repository anyway, so that's not really fixable anymore. But if we can prevent new broken files from entering the repository, that would be great. Maybe that's a good idea, but /why/ put it in libsvn_repos when it's much easier and cleaner to just provide some standard hooks, written in C and distributed with releases, that the admin plug in if she feels like it? Surely that's what hooks are for. The argument is that a Subversion server should be enforcing Subversion's invariants. That said, I'm not opposed to doing it via standard hooks. It's a good way to introduce the feature in a way that allows more easily changing it before it hits the APIs and their strict compatibility rules. Yes, that would be fine I think. I like Ivan's suggestion of creating a new standard 'svnhooks' program or something like that, which can then be part of standard distributions. -- Johan
RE: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
-Original Message- From: Branko Čibej [mailto:br...@wandisco.com] Sent: maandag 10 december 2012 8:04 To: dev@subversion.apache.org Subject: Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065) On 10.12.2012 07:35, Bert Huijben wrote: I don’t think you have to wait until commit time: You could verify the commit base revision’s properties + changes. In the cases where the properties change before commit, the commit would fail for being out of date. That would imply you can't change properties and contents in the same commit, wouldn't it. Which I suspect is not the case? :) The problem is worse, you can theoretically reopen and modify a transaction any number of times before committing it, so you don't really know until txn commit time which properties actually apply to which file contents. On the fs layer this is probably true, but in a normal editor drive through the ra layers this should never happen. On the ra layer the final set of properties is valid in the file_close() callback. I'm just not sure if this is really an event that the server sees for ra_dav, or if this is implied by the committing. Bert
svnhooks program (was Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065))
On Mon, Dec 10, 2012 at 3:53 PM, Johan Corveleyn jcor...@gmail.com wrote: On Mon, Dec 10, 2012 at 12:47 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: [...] The argument is that a Subversion server should be enforcing Subversion's invariants. That said, I'm not opposed to doing it via standard hooks. It's a good way to introduce the feature in a way that allows more easily changing it before it hits the APIs and their strict compatibility rules. Yes, that would be fine I think. I like Ivan's suggestion of creating a new standard 'svnhooks' program or something like that, which can then be part of standard distributions. I'd like to share my thoughts about 'svnhooks' if someone is going to implement it. Introduce svnhooks program with the separate subcommand for each hook kind: 'pre-commit', 'post-comit', 'pre-lock', 'pre-unlock and etc. The polices for svnhooks program defined in conf\svnhooks.conf file. This makes very easy for user to use svnhooks: just add line 'svnhooks HOOK-TYPE $*' in each hook and tweak svnhooks.conf configuration file. svnhooks configuration file has separate section for each policy. For example: [[[ [check-eol-normalization] enable = yes [rev-propchange] enable = yes users = svnsync [edit-log-message] enable = yes users = admin, @author ]]] -- Ivan Zhakov
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
Johan Corveleyn wrote on Sun, Dec 09, 2012 at 01:02:55 +0100: Last week a colleague managed to commit a non-LF-normalized svn:eol-style=native file in our repository again. As explained in issue #4065 [1], this causes all kinds of problems. I suspect there might be a bug in SVNKit, some edge-case where it doesn't correctly translate the to-be-committed files to LF-termination. But regardless, I'd like to protect my repository. I don't fully control the clients that people use, and those clients can always have bugs, ... So ... how hard would it be to fix issue #4065, making the server enforce the right eol-ness for correct operation? It's a genuine question, I have no idea :-). for i in $(svnlook changed -t $TXN $REPOS); do if propget = native || propget = LF ; then svnlook cat -t $TXN $REPOS $i | xxd -c1 -p | grep -i 0d fi done And writing that made me realise... contains CR is such a simple condition, that you don't need the fulltext for it --- you would be able to implement it by looking at the literal new text segments within svndiff streams directly. However, I'm not quite sure whether this observation is the key to a simple implementation or to an unnecessarily-complicated one. [1] http://subversion.tigris.org/issues/show_bug.cgi?id=4065 - server should enforce LF normalization for svn:eol-style=native files -- Johan
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On 09.12.2012 09:14, Daniel Shahaf wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 01:02:55 +0100: Last week a colleague managed to commit a non-LF-normalized svn:eol-style=native file in our repository again. As explained in issue #4065 [1], this causes all kinds of problems. I suspect there might be a bug in SVNKit, some edge-case where it doesn't correctly translate the to-be-committed files to LF-termination. But regardless, I'd like to protect my repository. I don't fully control the clients that people use, and those clients can always have bugs, ... So ... how hard would it be to fix issue #4065, making the server enforce the right eol-ness for correct operation? It's a genuine question, I have no idea :-). for i in $(svnlook changed -t $TXN $REPOS); do if propget = native || propget = LF ; then svnlook cat -t $TXN $REPOS $i | xxd -c1 -p | grep -i 0d fi done And writing that made me realise... contains CR is such a simple condition, that you don't need the fulltext for it --- you would be able to implement it by looking at the literal new text segments within svndiff streams directly. However, I'm not quite sure whether this observation is the key to a simple implementation or to an unnecessarily-complicated one. [1] http://subversion.tigris.org/issues/show_bug.cgi?id=4065 - server should enforce LF normalization for svn:eol-style=native files The server does not look at the contents of files, or the value of properties. It does not even know that properties /have/ semantics. The only reasonable place to do this would be in a pre-commit hook. Anything else would require a major change in the design of the server and/or libsvn_repos. If the client encounters non-normalized files with svn:eol-style=native and does /not/ properly normalize them regardless, that's a client bug. Though of course the file would immediately show up as modified as soon as it was updated. -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On Sun, Dec 9, 2012 at 11:42 AM, Branko Čibej br...@wandisco.com wrote: On 09.12.2012 09:14, Daniel Shahaf wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 01:02:55 +0100: Last week a colleague managed to commit a non-LF-normalized svn:eol-style=native file in our repository again. As explained in issue #4065 [1], this causes all kinds of problems. I suspect there might be a bug in SVNKit, some edge-case where it doesn't correctly translate the to-be-committed files to LF-termination. But regardless, I'd like to protect my repository. I don't fully control the clients that people use, and those clients can always have bugs, ... So ... how hard would it be to fix issue #4065, making the server enforce the right eol-ness for correct operation? It's a genuine question, I have no idea :-). for i in $(svnlook changed -t $TXN $REPOS); do if propget = native || propget = LF ; then svnlook cat -t $TXN $REPOS $i | xxd -c1 -p | grep -i 0d fi done And writing that made me realise... contains CR is such a simple condition, that you don't need the fulltext for it --- you would be able to implement it by looking at the literal new text segments within svndiff streams directly. However, I'm not quite sure whether this observation is the key to a simple implementation or to an unnecessarily-complicated one. [1] http://subversion.tigris.org/issues/show_bug.cgi?id=4065 - server should enforce LF normalization for svn:eol-style=native files The server does not look at the contents of files, or the value of properties. It does not even know that properties /have/ semantics. The only reasonable place to do this would be in a pre-commit hook. Anything else would require a major change in the design of the server and/or libsvn_repos. If the client encounters non-normalized files with svn:eol-style=native and does /not/ properly normalize them regardless, that's a client bug. Though of course the file would immediately show up as modified as soon as it was updated. Bah, that just sounds like the client (any client) should paper over these corrupt files. I don't think that's a good solution. These files are present in the repository in a corrupt form, not following the expected invariant that they should have LF line endings. If you 'svnlook cat' such a file, it will have CRLF line endings, while normal svn:eol-style=native files will always have LF line endings when 'svnlook cat'ed. Any client that diffs that change ('svnlook diff', 'svn diff -c $REV $REPOS', ...), will see a full file diff. They would all have to normalize that corrupt file on the fly to give a good diff like it should have been if it weren't corrupted. Note that the Apache repository itself already contains such corruptions (see [1], the user thread started by Konstantin Kolinko, which prompted me to file the issue). Any large repository will have them, because of the diversity of svn clients and bugs they may contain. These buggy clients ruin the experience for everyone else. I'd like there to be some better protection against this problem, centrally. A pre-commit hook is an option, and I'll probably have to go that route if there is no other way. But I see two problems with it: 1) It's dog slow. Doing an svnlook pg and then potentially svnlook cat for every changed file can easily make the pre-commit hook take longer than the standard client-side timeout of 30 seconds (for neon, dunno about the serf timeout). Just imagine doing a catch-up merge touching 1000 files. Parsing both the properties and the content from one svnlook diff call might be a workaround, but yuck. 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. [1] http://svn.haxx.se/users/archive-2011-10/0287.shtml -- Johan
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. [1] http://svn.haxx.se/users/archive-2011-10/0287.shtml -- Johan
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. Thanks. And your pre-commit hook example is much appreciated. For the moment I get the impression that it's not really doable / desirable to implement this in the repository. At least until now no-one has suggested how it could be done, and I don't know enough myself about the server-side / back-end to figure it out :-). (side-note: I don't want to rush anything and get it backported to a 1.7.x release. For the moment I can manage (I watch the problem from an asynchronous post-commit hook, and fix it as soon as it happens, which is only a couple of times a year in our repos). But I'm interested in a long-term solution to this, so this wart can be eliminated :-)). As for my 1st concern, about the slowness of the pre-commit hook validation, it might be possible to make it faster if both 'svnlook pg' and 'svnlook cat' would support multiple targets. -- Johan
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
Johan Corveleyn wrote on Mon, Dec 10, 2012 at 00:08:26 +0100: On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. Thanks. And your pre-commit hook example is much appreciated. For the moment I get the impression that it's not really doable / desirable to implement this in the repository. At least until now no-one has suggested how it could be done, and I don't know enough myself about the server-side / back-end to figure it out :-). Well, as Brane said the logic should live in libsvn_repos. I would guess the code path of interest is the commit editor. (The property validations live in svn_repos__validate_prop(), in fs-wrap.c.) (side-note: I don't want to rush anything and get it backported to a 1.7.x release. For the moment I can manage (I watch the problem from an asynchronous post-commit hook, and fix it as soon as it happens, which is only a couple of times a year in our repos). But I'm interested in a long-term solution to this, so this wart can be eliminated :-)). Well, just extend the wart to also fix it as soon as it happens: % svn cat $FILE | dos2unix | svnmucc -mm put /dev/stdin $FILE (BTW, there's a race condition in this line --- it's not guaranteed that 'cat' and 'put' open the RA session to the same base revision) As for my 1st concern, about the slowness of the pre-commit hook validation, it might be possible to make it faster if both 'svnlook pg' and 'svnlook cat' would support multiple targets. Or if you wrote your hook script using the Python bindings. (only one svn_fs_open call, intra-process caches, ...) -- Johan
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On 10.12.2012 00:08, Johan Corveleyn wrote: On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. Thanks. And your pre-commit hook example is much appreciated. For the moment I get the impression that it's not really doable / desirable to implement this in the repository. At least until now no-one has suggested how it could be done, and I don't know enough myself about the server-side / back-end to figure it out :-). The first obstacle is that the server does not interpret properties.[1] Therefore, you'd have to implement this check at transaction-commit time, because there's no earlier point where you're guaranteed to have all node properties at their final values. This implies that the time to reject a commit would be proportional to the size of the commit (which isn't the case when it comes to conflict detection). All properties are interpreted by clients. A buggy client will cause cause problems for other users, so the best course of action is to report the bug (to SvnKit in this case?). Also note that you're using a much too strong term when you talk about corrupted files in this case. There's nothing corrupted about them. -- Brane [1] Not strictly true since mod_dav_svn will look at svn:mime-type when serving the content at its default, unversioned URL; but it doesn't actually interpret the value. -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On 10.12.2012 00:21, Daniel Shahaf wrote: As for my 1st concern, about the slowness of the pre-commit hook validation, it might be possible to make it faster if both 'svnlook pg' and 'svnlook cat' would support multiple targets. Or if you wrote your hook script using the Python bindings. (only one svn_fs_open call, intra-process caches, ...) Or you can write that pre-commit hook in C, after all, using our very own libraries. It'll take a bit longer than doing it in Python, but much less than trying to shoehorn something into libsvn_repos that wasn't designed to be there. (N.B.: Even if it /was/ in libsvn_repos, it'd just get called at the same time as the pre-commit hook; so you'd save a fork+exec and a few open calls, but not that much more.) -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On Mon, Dec 10, 2012 at 12:26 AM, Branko Čibej br...@wandisco.com wrote: On 10.12.2012 00:08, Johan Corveleyn wrote: On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. Thanks. And your pre-commit hook example is much appreciated. For the moment I get the impression that it's not really doable / desirable to implement this in the repository. At least until now no-one has suggested how it could be done, and I don't know enough myself about the server-side / back-end to figure it out :-). The first obstacle is that the server does not interpret properties.[1] Therefore, you'd have to implement this check at transaction-commit time, because there's no earlier point where you're guaranteed to have all node properties at their final values. This implies that the time to reject a commit would be proportional to the size of the commit (which isn't the case when it comes to conflict detection). Well, if we really want to implement such a feature, we may as well be a bit more clever about it: While writing incoming file contents to the protrev file, we may determine its NL style as we go and store the result. The latter could then be compared to the props setting during commit time. -- Stefan^2. -- Certified Supported Apache Subversion Downloads: * http://www.wandisco.com/subversion/download *
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On Mon, Dec 10, 2012 at 12:26 AM, Branko Čibej br...@wandisco.com wrote: On 10.12.2012 00:08, Johan Corveleyn wrote: On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. Thanks. And your pre-commit hook example is much appreciated. For the moment I get the impression that it's not really doable / desirable to implement this in the repository. At least until now no-one has suggested how it could be done, and I don't know enough myself about the server-side / back-end to figure it out :-). The first obstacle is that the server does not interpret properties.[1] Therefore, you'd have to implement this check at transaction-commit time, because there's no earlier point where you're guaranteed to have all node properties at their final values. This implies that the time to reject a commit would be proportional to the size of the commit (which isn't the case when it comes to conflict detection). All properties are interpreted by clients. A buggy client will cause cause problems for other users, so the best course of action is to report the bug (to SvnKit in this case?). That's actually pretty hard, because I can't reproduce the issue. I only suspect svnkit because I've only seen it (at our company) with commits coming from IntelliJ IDEA, which uses svnkit under the hood. Another known offender is git-svn, because they don't care about translating line termination. So if you use git-svn on Windows, you might end up committing an eol-style=native file with CRLF's in to the repository. Also note that you're using a much too strong term when you talk about corrupted files in this case. There's nothing corrupted about them. Well, it does mess up all diffs (client-side, server side for post-commit mails, ...) made against that revision. Plus it causes these files to be marked as modified in working copies of other users ... sometimes (when their timestamps mismatch with their timestamp-in-wc.db, so the content will be checksummed -- if the timestamps are still ok, it will not show up as modified). This in turn can cause totally confusing tree conflicts when a parent dir is moved (confusing because there's a local edit that the user knows nothing about). But okay, corrupt may be exaggerating a bit :-) ... -- Brane [1] Not strictly true since mod_dav_svn will look at svn:mime-type when serving the content at its default, unversioned URL; but it doesn't actually interpret the value. -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com -- Johan
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On 10.12.2012 01:24, Stefan Fuhrmann wrote: Well, if we really want to implement such a feature, we may as well be a bit more clever about it: While writing incoming file contents to the protrev file, we may determine its NL style as we go and store the result. The latter could then be compared to the props setting during commit time. Premature optimization. We haven't even decided we want this in libsvn_repos, let alone libsvn_fs_fs, which is the only place where protorev file is even a meaningful term. -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On 10.12.2012 01:25, Johan Corveleyn wrote: Another known offender is git-svn, because they don't care about translating line termination. So if you use git-svn on Windows, you might end up committing an eol-style=native file with CRLF's in to the repository. Yuck. We should shout *really* loudly out to the world about broken git-svn then, if it doesn't honour our wire protocol. This might even get me to lean towards rejecting these kinds of commits by default. -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
RE: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
I don’t think you have to wait until commit time: You could verify the commit base revision’s properties + changes. In the cases where the properties change before commit, the commit would fail for being out of date. Bert *From:* Branko Čibej *Sent:* December 10, 2012 12:26 AM *To:* dev@subversion.apache.org *Subject:* Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065) On 10.12.2012 00:08, Johan Corveleyn wrote: On Sun, Dec 9, 2012 at 11:43 PM, Daniel Shahaf d...@daniel.shahaf.name wrote: Johan Corveleyn wrote on Sun, Dec 09, 2012 at 21:15:24 +0100: 2) Am I the only one who wants to protect his repository against this corruption? Judging from [1], I don't think so. It doesn't make sense that everyone starts writing this pre-commit hook, for something that IMHO is an obvious anti-corruption protection. I think every repository out there deserves to be protected against this. FWIW, I suggested a hook because you could implement that in about 5 minutes, whereas doing a C code change would take at least 10 times that (and several weeks if you have to wait for it to appear in a 1.7.x release that you can install at $WORK). I won't object to C code verifying the svn:eol-style invariant. Thanks. And your pre-commit hook example is much appreciated. For the moment I get the impression that it's not really doable / desirable to implement this in the repository. At least until now no-one has suggested how it could be done, and I don't know enough myself about the server-side / back-end to figure it out :-). The first obstacle is that the server does not interpret properties.[1] Therefore, you'd have to implement this check at transaction-commit time, because there's no earlier point where you're guaranteed to have all node properties at their final values. This implies that the time to reject a commit would be proportional to the size of the commit (which isn't the case when it comes to conflict detection). All properties are interpreted by clients. A buggy client will cause cause problems for other users, so the best course of action is to report the bug (to SvnKit in this case?). Also note that you're using a much too strong term when you talk about corrupted files in this case. There's nothing corrupted about them. -- Brane [1] Not strictly true since mod_dav_svn will look at svn:mime-type when serving the content at its default, unversioned URL; but it doesn't actually interpret the value. -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
Re: enforcing LF-normalization for svn:eol-style=native files (issue #4065)
On 10.12.2012 07:35, Bert Huijben wrote: I don’t think you have to wait until commit time: You could verify the commit base revision’s properties + changes. In the cases where the properties change before commit, the commit would fail for being out of date. That would imply you can't change properties and contents in the same commit, wouldn't it. Which I suspect is not the case? :) The problem is worse, you can theoretically reopen and modify a transaction any number of times before committing it, so you don't really know until txn commit time which properties actually apply to which file contents. -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
enforcing LF-normalization for svn:eol-style=native files (issue #4065)
Last week a colleague managed to commit a non-LF-normalized svn:eol-style=native file in our repository again. As explained in issue #4065 [1], this causes all kinds of problems. I suspect there might be a bug in SVNKit, some edge-case where it doesn't correctly translate the to-be-committed files to LF-termination. But regardless, I'd like to protect my repository. I don't fully control the clients that people use, and those clients can always have bugs, ... So ... how hard would it be to fix issue #4065, making the server enforce the right eol-ness for correct operation? It's a genuine question, I have no idea :-). [1] http://subversion.tigris.org/issues/show_bug.cgi?id=4065 - server should enforce LF normalization for svn:eol-style=native files -- Johan