Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-15 Thread Branko Čibej
On 09.01.2014 20:19, Branko Čibej wrote:
 On 09.01.2014 17:09, Mojca Miklavec wrote:
 I'm unable to reproduce the faulty behaviour if I do a checkout from
 the same network where the server is located, no matter what I try
 (upgrading SVN client doesn't help triggering the error). Philip
 also said that he had no problem doing a checkout with client version
 1.8.5 or 1.7.

 This confirms my suspicion that the error is triggered by some part of
 the network infrastructure between your server and the outside world.
 That's why I asked if there is a load-balancer involved. It could also
 be caused by some kind of transparent proxy, or even a packet
 analyzer. I doubt that your server is open to the world without some
 kind of security measures in place.

 To be clear, I'm not saying that any of these things are configured
 incorrectly; only that they may be interacting with Subversion in a
 way that we don't handle well. One of the major differences between
 1.7 (which works) and 1.8 (which fails) is that we try to work around
 issues with non-standard behaviour of certain transparent (sic)
 proxies; and we can't claim to have covered all the possibilities.

 I can't see a way to figure out what's going on without help from your
 network admins; we need some insight into why the connection is being
 reset on the server side, and analyzing the TCP stream on the client
 can't tell us that.


 BTW, if you think it'd help to try a live debugging session, I'm only
 about an hour's drive away from IJS.

So to wrap this up: they managed to fix the problem themselves, and it
was indeed some part of the network infrastructure; the specifics are
as follows:

They have a Cisco ASA 5580 running IOS 9.1(4), and they had HTTP
protocol inspection turned on; the configuration was as follows:

policy-map type inspect http HTTP-CONTROL
 parameters
  protocol-violation action log
policy-map global_policy
 class inspection_default
  inspect http HTTP-CONTROL


The ASA was closing the connections, and their logs contained one of the
following two reasons:

%ASA-4-415016: policy-map HTTP-CONTROL:Maximum number of unanswered HTTP 
requests exceeded - Resetting connection from Ext:x.x.x.x/59769 to 
Int:y.y.y.y/80
%ASA-4-507003: tcp flow from Ext:x.x.x.x/59769 to Int:y.y.y.y/80 terminated by 
inspection engine, reason - reset unconditionally.


The only reasonable explanation we could come up with was that
moderately low bandwidth and high latency between client and server,
coupled with the fact that some of the files in the repository are
rather large and take a while to transfer, caused the 1.8 client to
queue up enough pipelined GET requests during checkout that the firewall
decided to call quits. A 1.7 client (using serf) did not exhibit this
problem, because it also sends PROPGETS, and this apparently changed the
timing enough that the number of pipelined requests never exceeded the
ASA's configured maximum.

Apparently this is not a new problem, having been reported before:
https://supportforums.cisco.com/thread/2088590

They fixed the issue by switching off HTTP protocol inspection on the
ASA. Interestingly enough, this also fixed a number of intermittent
issues with plain ol' Web browsing that they had on occasion, so this is
not specific to Subversion (as the link above also suggests), but is
rather a bug^Wserious limitation of the ASA and/or IOS.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. br...@wandisco.com


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-09 Thread Mojca Miklavec
On Wed, Jan 8, 2014 at 4:53 PM, Philip Martin wrote:

 I get a problem with the checkout from your server using a trunk client.
 Very occasionally the checkout works but most of the time the client
 simply hangs while receiving the first file.

 It appears that the client is sending the REPORT request and receiving
 the response from the server.  The client then pipelines 13 GET requests
 corresponding to the 13 files in the working copy.  The server starts
 sending the response to the first GET and the client starts receiving it
 but the server never completes the response.  The client hangs waiting
 for the server and eventually times out.

 If I use wireshark it shows the server sending an RST packet just before
 the client hangs.  According to wireshark this is a Bad checksum
 packet.  Wireshark shows the client retransmitting the GETs but there is
 no further server repsonse.

 I don't know enough to debug the problem further.

I now upgraded the SVN client to 1.9.0-dev from trunk. With trunk
version it's still inconsistent behaviour, but at least reproducible
to a certain extent.

I tried to checkout file a couple of times. Almost every time I get
the following lines in error.log on the server:

Unable to deliver content.  [500, #0]
Could not write data to filter.  [500, #175002]

but the first time the whole checkout finished successfully, even
though the server first recorded 500 and apparently another 200
(success) on the second attempt for the same file. The client ended
with success.

The second time the client reported
svn: E54: Error running context: Connection reset by peer
(and the same happened when I ran it for the third/fourth/fifth/...
time) Sometimes it works though. And it usually hangs on different
files.

I'm unable to reproduce the faulty behaviour if I do a checkout from
the same network where the server is located, no matter what I try
(upgrading SVN client doesn't help triggering the error). Philip
also said that he had no problem doing a checkout with client version
1.8.5 or 1.7.

With subversion client 1.8.5 I'm sometimes able to reproduce the
problem from a different network, but it usually works. I tried
wireshark, but I don't know what to do with the zillions of packets it
shows me.

I'll first try to copy the repository to another server to see if I
could reproduce the problem from there. Other than that I would be
grateful for any hints if there exists some painless way to debug the
server.

Mojca


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-09 Thread Branko Čibej
On 09.01.2014 17:09, Mojca Miklavec wrote:
 I'm unable to reproduce the faulty behaviour if I do a checkout from
 the same network where the server is located, no matter what I try
 (upgrading SVN client doesn't help triggering the error). Philip
 also said that he had no problem doing a checkout with client version
 1.8.5 or 1.7.

This confirms my suspicion that the error is triggered by some part of
the network infrastructure between your server and the outside world.
That's why I asked if there is a load-balancer involved. It could also
be caused by some kind of transparent proxy, or even a packet analyzer.
I doubt that your server is open to the world without some kind of
security measures in place.

To be clear, I'm not saying that any of these things are configured
incorrectly; only that they may be interacting with Subversion in a way
that we don't handle well. One of the major differences between 1.7
(which works) and 1.8 (which fails) is that we try to work around issues
with non-standard behaviour of certain transparent (sic) proxies; and
we can't claim to have covered all the possibilities.

I can't see a way to figure out what's going on without help from your
network admins; we need some insight into why the connection is being
reset on the server side, and analyzing the TCP stream on the client
can't tell us that.


BTW, if you think it'd help to try a live debugging session, I'm only
about an hour's drive away from IJS.


-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. br...@wandisco.com


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-09 Thread Ben Reser
On 1/9/14, 11:19 AM, Branko Čibej wrote:
 To be clear, I'm not saying that any of these things are configured
 incorrectly; only that they may be interacting with Subversion in a way that 
 we
 don't handle well. One of the major differences between 1.7 (which works) and
 1.8 (which fails) is that we try to work around issues with non-standard
 behaviour of certain transparent (sic) proxies; and we can't claim to have
 covered all the possibilities.

Actually we know we haven't covered all possibilities.  Had someone a while
back that had mod_security setup in such a way that it was rejecting some
request methods (think it was POST) without Content-Length (thus breaking
chunked requests).  The behavior didn't fail for the OPTION requests so our
probe to try and work around transparent proxies failed.

But I'm not sure what this thread would really have to do with chunked
requests, since the problem seems to be pipelining which as far as I know we
don't have any workarounds for.

We can rule out the chunked requests by disabling it by adding this to the
command line --config-option servers:global:http-chunked-requests=no and seeing
if it changes anything.  But I really doubt it based on what I've seen on this
thread.

More details on what Branko is talking about and the config option I mentioned
here:
https://subversion.apache.org/docs/release-notes/1.8.html#411-length-required



Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-09 Thread Philip Martin
Ben Reser b...@reser.org writes:

 Actually we know we haven't covered all possibilities.  Had someone a while
 back that had mod_security setup in such a way that it was rejecting some
 request methods (think it was POST) without Content-Length (thus breaking
 chunked requests).  The behavior didn't fail for the OPTION requests so our
 probe to try and work around transparent proxies failed.

 But I'm not sure what this thread would really have to do with chunked
 requests, since the problem seems to be pipelining which as far as I know we
 don't have any workarounds for.

 We can rule out the chunked requests by disabling it by adding this to the
 command line --config-option servers:global:http-chunked-requests=no and 
 seeing
 if it changes anything.  But I really doubt it based on what I've seen on this
 thread.

Disabling chunked requests makes no difference.  I see a trunk client
failing most of the time, but occasionally it succeeds.  When it fails
it ususally hangs and eventually times-out, but occasionally it fails
with Connection reset by peer.  1.8 fails like trunk, but 1.7/serf
works.

If we believe Wireshark then the server is sending an RST part way
through the response to the first of 13 pipelined GETs.  The response
reponse is not chunked, it has Content-Length:8407648, but the client
only receives 14480 bytes (I think that includes the headers).

1.7/serf, which works, pipelines 13 PROPFINDs as well as 13 GETs.  If I
force trunk to pipeline the PROPFINDs using:

Index: ../src/subversion/libsvn_ra_serf/update.c
===
--- ../src/subversion/libsvn_ra_serf/update.c   (revision 1557003)
+++ ../src/subversion/libsvn_ra_serf/update.c   (working copy)
@@ -1630,7 +1630,7 @@
 
   val = svn_xml_get_attr_value(inline-props, attrs);
   if (val  (strcmp(val, true) == 0))
-ctx-add_props_included = TRUE;
+ctx-add_props_included = FALSE;
 
   val = svn_xml_get_attr_value(send-all, attrs);
   if (val  (strcmp(val, true) == 0))
@@ -1638,7 +1638,7 @@
   ctx-send_all_mode = TRUE;
 
   /* All properties are included in send-all mode. */
-  ctx-add_props_included = TRUE;
+  ctx-add_props_included = FALSE;
 }
 }
   else if (state == NONE  strcmp(name.name, target-revision) == 0)

then the checkout with trunk starts working reliably.  Wireshark no
longer shows an RST from the server, it does however show some packets
marked TCP Previous segment not captured and some marked TCP Dup
ACK.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-08 Thread Branko Čibej
Hi Mojca,

On 07.01.2014 20:58, Mojca Miklavec wrote:
 (The other problem with Error retrieving REPORT is still a mystery
 though.) Mojca 

I'm assuming your server is somewhere on the IJS network. Can you please
ask the admins there if your server is behind a load balancer?

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. br...@wandisco.com


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-08 Thread Mojca Miklavec
On Wed, Jan 8, 2014 at 1:11 PM, Branko Čibej wrote:
 Hi Mojca,

 On 07.01.2014 20:58, Mojca Miklavec wrote:
 (The other problem with Error retrieving REPORT is still a mystery
 though.) Mojca

 I'm assuming your server is somewhere on the IJS network. Can you please
 ask the admins there if your server is behind a load balancer?

I asked and it is not.

Mojca


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-08 Thread Philip Martin
Mojca Miklavec mojca.miklavec.li...@gmail.com writes:

 On Wed, Jan 8, 2014 at 1:11 PM, Branko Čibej wrote:
 Hi Mojca,

 On 07.01.2014 20:58, Mojca Miklavec wrote:
 (The other problem with Error retrieving REPORT is still a mystery
 though.) Mojca

 I'm assuming your server is somewhere on the IJS network. Can you please
 ask the admins there if your server is behind a load balancer?

 I asked and it is not.

I get a problem with the checkout from your server using a trunk client.
Very occasionally the checkout works but most of the time the client
simply hangs while receiving the first file.

It appears that the client is sending the REPORT request and receiving
the response from the server.  The client then pipelines 13 GET requests
corresponding to the 13 files in the working copy.  The server starts
sending the response to the first GET and the client starts receiving it
but the server never completes the response.  The client hangs waiting
for the server and eventually times out.

If I use wireshark it shows the server sending an RST packet just before
the client hangs.  According to wireshark this is a Bad checksum
packet.  Wireshark shows the client retransmitting the GETs but there is
no further server repsonse.

I don't know enough to debug the problem further.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-07 Thread Philip Martin
Mojca Miklavec mojca.miklavec.li...@gmail.com writes:

 We have a server running Fedora which has recently been upgraded to
 version 20 and it's now running
 svn, version 1.8.5 (r1542147)

 I have a bunch of repositories served over http protocol with public
 read access and limited commit access.

 Shortly after the upgrade a weird behaviour has been noticed. Running
 svn up on the top level dir worked ok for me, but running
 svn co http://svn.myserver.net/myrepo/dirA
 fails with

 AdirA/subdir1
 AdirA/subdir2
 AdirA/subdir3
 AdirA/subdir4
 svn: E54: Error retrieving REPORT: Connection reset by peer

 The directory dirA contains one more file FILE.txt. Checking out any
 individual subdirN works and the browser is able to display the
 contents of dirA.

 Trying to click on FILE.txt in the browser sometimes works (it
 currently does) and sometimes shows an XML (like a few minutes ago,
 but I'm unable to get it now), saying something similar to the error I
 get in console***:

 svn: E175002: Unable to connect to a repository at URL
 'svn.myserver.net/myrepo/dirA'
 svn: E175002: Unexpected HTTP status 500 'Internal Server Error' on
 '/myrepo/dirA'

 svn: E160004: Additional errors:
 svn: E160004: Corrupt node-revision '2-1.0.r137/330061'

 (*** To be precise: this is the error I get after upgrading the
 repository to the latest version of SVN, I didn't try to get to this
 error before upgrading.)

 The error.log in apache says just:

 [date] [dav:error] [pid 3613] [client ip:port] Unable to deliver
 content.  [500, #0]
 [date] [dav:error] [pid 3613] [client ip:port] Could not write
 data to filter.  [500, #175002]

 I first tried if upgrading the repository would help in any way, so I did
 svnadmin dump oldrepo | svnadmin load newrepo
 and checking the relevant revision r137 cited in the error all I see
 is the following (nothing unusual):

 --- Committed revision 136 

  Started new transaction, based on original revision 137
  * editing path : dirA/FILE.txt ... done.
 * Dumped revision 137.
  * editing path : dirA/subdir1/somefile ... done.

 --- Committed revision 137 

 Checking out the same repository via http on the machine where the
 repository itself is located works fine.

 I'm using the same version of SVN (1.8.5) on Mac, but other svn
 clients on other OSes have problems as well.

 I tried checking the repository health with
 svnadmin verify /path/to/myrepo
 and all revisions passed except for some weird error inbetween (the
 file rev-prop-atomics.mutex is actually missing, but it isn't present
 in any other repository either):

 * Verifying repository metadata ...
 * Verifying metadata at revision 1 ...
 ...
 * Verifying metadata at revision 155 ...
 svnadmin: E160052: Revprop caching for '/path/to/myrepo/db' disabled
 because SHM infrastructure for revprop caching failed to initialize.
 svnadmin: E13: Can't open file
 '/path/to/myrepo/db/rev-prop-atomics.mutex': Permission denied
 * Verified revision 0.
 ...
 * Verified revision 160.


 I would appreciate any help or debugging hints. If necessary I can
 share the repository URL (but I would prefer to share it off-list to
 anyone interested in debugging). I can also try to debug myself, but I
 need some instructions telling me what to check. I didn't manage to
 find anything useful by googling the errors other than figuring out
 that the error was part of the code to fix a memory leak
 (http://svn.haxx.se/dev/archive-2009-08/0274.shtml).

I've not seen E54 before but it is EXFULL which is some sort of
network error.  I suppose the corruption causes some sort of output
problem.

E13 is EACCES so you are running verify without write access to the
repository.  That seems like a perfectly reasonable thing to do so we
should probably make the warning less intimidating.

It's very odd that Apache is reporting corruption but both the dump/load
and verify work without problem.  Is the problem reproducible if you
restart Apache?

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-07 Thread Mojca Miklavec
On Tue, Jan 7, 2014 at 12:41 PM, Philip Martin wrote:
 Mojca Miklavec writes:

 We have a server running Fedora which has recently been upgraded to
 version 20 and it's now running
 svn, version 1.8.5 (r1542147)

 I have a bunch of repositories served over http protocol with public
 read access and limited commit access.

 Shortly after the upgrade a weird behaviour has been noticed. Running
 svn up on the top level dir worked ok for me, but running
 svn co http://svn.myserver.net/myrepo/dirA
 fails with

 AdirA/subdir1
 AdirA/subdir2
 AdirA/subdir3
 AdirA/subdir4
 svn: E54: Error retrieving REPORT: Connection reset by peer

 The directory dirA contains one more file FILE.txt. Checking out any
 individual subdirN works and the browser is able to display the
 contents of dirA.

 Trying to click on FILE.txt in the browser sometimes works (it
 currently does) and sometimes shows an XML (like a few minutes ago,
 but I'm unable to get it now), saying something similar to the error I
 get in console***:

 svn: E175002: Unable to connect to a repository at URL
 'svn.myserver.net/myrepo/dirA'
 svn: E175002: Unexpected HTTP status 500 'Internal Server Error' on
 '/myrepo/dirA'

 svn: E160004: Additional errors:
 svn: E160004: Corrupt node-revision '2-1.0.r137/330061'

 (*** To be precise: this is the error I get after upgrading the
 repository to the latest version of SVN, I didn't try to get to this
 error before upgrading.)

 The error.log in apache says just:

 [date] [dav:error] [pid 3613] [client ip:port] Unable to deliver
 content.  [500, #0]
 [date] [dav:error] [pid 3613] [client ip:port] Could not write
 data to filter.  [500, #175002]

 I first tried if upgrading the repository would help in any way, so I did
 svnadmin dump oldrepo | svnadmin load newrepo
 and checking the relevant revision r137 cited in the error all I see
 is the following (nothing unusual):

 --- Committed revision 136 

  Started new transaction, based on original revision 137
  * editing path : dirA/FILE.txt ... done.
 * Dumped revision 137.
  * editing path : dirA/subdir1/somefile ... done.

 --- Committed revision 137 

 Checking out the same repository via http on the machine where the
 repository itself is located works fine.

 I'm using the same version of SVN (1.8.5) on Mac, but other svn
 clients on other OSes have problems as well.

 I tried checking the repository health with
 svnadmin verify /path/to/myrepo
 and all revisions passed except for some weird error inbetween (the
 file rev-prop-atomics.mutex is actually missing, but it isn't present
 in any other repository either):

 * Verifying repository metadata ...
 * Verifying metadata at revision 1 ...
 ...
 * Verifying metadata at revision 155 ...
 svnadmin: E160052: Revprop caching for '/path/to/myrepo/db' disabled
 because SHM infrastructure for revprop caching failed to initialize.
 svnadmin: E13: Can't open file
 '/path/to/myrepo/db/rev-prop-atomics.mutex': Permission denied
 * Verified revision 0.
 ...
 * Verified revision 160.


 I would appreciate any help or debugging hints. If necessary I can
 share the repository URL (but I would prefer to share it off-list to
 anyone interested in debugging). I can also try to debug myself, but I
 need some instructions telling me what to check. I didn't manage to
 find anything useful by googling the errors other than figuring out
 that the error was part of the code to fix a memory leak
 (http://svn.haxx.se/dev/archive-2009-08/0274.shtml).

 I've not seen E54 before but it is EXFULL which is some sort of
 network error.  I suppose the corruption causes some sort of output
 problem.

 E13 is EACCES so you are running verify without write access to the
 repository.  That seems like a perfectly reasonable thing to do so we
 should probably make the warning less intimidating.

 It's very odd that Apache is reporting corruption but both the dump/load
 and verify work without problem.  Is the problem reproducible if you
 restart Apache?

Yes, there is still a problem after restarting Apache. Even though it
works for me at the moment and I tried fetching from multiple
locations and servers, other users are still experiencing the same
problem. Logs on the server confirm that. (Unable to deliver content.
[500, #0] + Could not write data to filter.  [500, #175002])

Mojca


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-07 Thread Mojca Miklavec
On Tue, Jan 7, 2014 at 5:18 PM, Philip Martin wrote:
 Mojca Miklavec writes:

 Yes, there is still a problem after restarting Apache. Even though it
 works for me at the moment and I tried fetching from multiple
 locations and servers, other users are still experiencing the same
 problem. Logs on the server confirm that. (Unable to deliver content.
 [500, #0] + Could not write data to filter.  [500, #175002])

 Does the server log always contain the error:

svn: E160004: Corrupt node-revision '2-1.0.r137/330061'

I don't see that in the server log, but I was only checking error.log
written by Apache server, I don't know where else to look, but I can
check if you point me in the right direction. This error is sometimes
displayed by the client (either in XML in the browser or as an error
in the command line during svn up), but it's not consistent and it
often works properly.

It sometimes works in the first attempt, fails in the second one, and
succeeds in the third attempt again. Only seconds or minutes apart.

 Is it always '2-1.0.r137/330061'?

The exact revision reported as currupt depends on which subfolder I'm
checking out. I believe it reports the last commit when files in that
particular subfolder were modified. (I've seen this error when
checking out two different subfolders. The number was always the same
for the same subfolder, but different for different subfolders.)

(It is a bit difficult to test because the behaviour is not consistent.)

Mojca


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-07 Thread Philip Martin
Mojca Miklavec mojca.miklavec.li...@gmail.com writes:

 On Tue, Jan 7, 2014 at 5:18 PM, Philip Martin wrote:
 Mojca Miklavec writes:

 Yes, there is still a problem after restarting Apache. Even though it
 works for me at the moment and I tried fetching from multiple
 locations and servers, other users are still experiencing the same
 problem. Logs on the server confirm that. (Unable to deliver content.
 [500, #0] + Could not write data to filter.  [500, #175002])

 Does the server log always contain the error:

svn: E160004: Corrupt node-revision '2-1.0.r137/330061'

 I don't see that in the server log, but I was only checking error.log
 written by Apache server, I don't know where else to look, but I can
 check if you point me in the right direction. This error is sometimes
 displayed by the client (either in XML in the browser or as an error
 in the command line during svn up), but it's not consistent and it
 often works properly.

It would be in the Apache error log.

Are you saying that sometimes the client gets the E175002 error without
the 'Corrupt node-revision' part?

Are you saying that the client gets the 'Corrupt node-revision' error
but it is not recorded in the error log?

 It sometimes works in the first attempt, fails in the second one, and
 succeeds in the third attempt again. Only seconds or minutes apart.

 Is it always '2-1.0.r137/330061'?

 The exact revision reported as currupt depends on which subfolder I'm
 checking out. I believe it reports the last commit when files in that
 particular subfolder were modified. (I've seen this error when
 checking out two different subfolders. The number was always the same
 for the same subfolder, but different for different subfolders.)

 (It is a bit difficult to test because the behaviour is not consistent.)

Which version of Apache are you using?  Which Apache MPM are you using?

What sort of filesystem is used for the repository?  Is it a local disk
or a network disk?

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-07 Thread Mojca Miklavec
On Tue, Jan 7, 2014 at 5:47 PM, Philip Martin wrote:
 Mojca Miklavec writes:
 On Tue, Jan 7, 2014 at 5:18 PM, Philip Martin wrote:
 Mojca Miklavec writes:

 Yes, there is still a problem after restarting Apache. Even though it
 works for me at the moment and I tried fetching from multiple
 locations and servers, other users are still experiencing the same
 problem. Logs on the server confirm that. (Unable to deliver content.
 [500, #0] + Could not write data to filter.  [500, #175002])

 Does the server log always contain the error:

svn: E160004: Corrupt node-revision '2-1.0.r137/330061'

 I don't see that in the server log, but I was only checking error.log
 written by Apache server, I don't know where else to look, but I can
 check if you point me in the right direction. This error is sometimes
 displayed by the client (either in XML in the browser or as an error
 in the command line during svn up), but it's not consistent and it
 often works properly.

 It would be in the Apache error log.

Ah, OK, I see it now in the old logs. There are no such lines in the
latest logs.

 Are you saying that sometimes the client gets the E175002 error without
 the 'Corrupt node-revision' part?

Yes. I'm attaching full log (with timestamps and IPs removed) for a
certain period of time around 4th January. There are plenty of E175002
errors without any subsequent 'Corrupt node-revision' part, including
all the latest entries (not part of the attachment).

 Are you saying that the client gets the 'Corrupt node-revision' error
 but it is not recorded in the error log?

I was wrong about that. I was only checking the latest error log where
all I get is

[dav:error] [pid 42289] [IP:29011] Unable to deliver content.  [500, #0]
[dav:error] [pid 42289] [IP:29011] Could not write data to filter.
[500, #175002]

But I've found those additonal errors in an old (archived) log. At the
moment I'm unable to reproduce the error 'Corrupt node-revision' both
on the client and in server logs, but the repository is still
misbehaving.

 It sometimes works in the first attempt, fails in the second one, and
 succeeds in the third attempt again. Only seconds or minutes apart.

 Is it always '2-1.0.r137/330061'?

 The exact revision reported as currupt depends on which subfolder I'm
 checking out. I believe it reports the last commit when files in that
 particular subfolder were modified. (I've seen this error when
 checking out two different subfolders. The number was always the same
 for the same subfolder, but different for different subfolders.)

 (It is a bit difficult to test because the behaviour is not consistent.)

 Which version of Apache are you using?  Which Apache MPM are you using?

Server version: Apache/2.4.7 (Unix)

I'm not sure how to check MPM. I get

 httpd -l
Compiled in modules:
  core.c
  mod_so.c
  http_core.c

but httpd -V as suggested on some websites doesn't work. How should
I check which MPM is being used?

 What sort of filesystem is used for the repository?  Is it a local disk
 or a network disk?

The repository is stored on a local disk. I'm not sure what about
filesystem is it that you are asking, but here are some possibly
relevant data:

 cat format
5
 cat db/fs-type
fsfs
 cat db/format
6
layout sharded 1000

(and before I upgraded the repository, db/format was 4). Is that what
you were asking or do did you want to know something else?

Mojca


error.log
Description: Binary data


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-07 Thread Philip Martin
Mojca Miklavec mojca.miklavec.li...@gmail.com writes:

 Ah, OK, I see it now in the old logs. There are no such lines in the
 latest logs.

 The repository is stored on a local disk. I'm not sure what about
 filesystem is it that you are asking, but here are some possibly
 relevant data:

 cat format
 5
 cat db/fs-type
 fsfs
 cat db/format
 6
 layout sharded 1000

 (and before I upgraded the repository, db/format was 4). Is that what
 you were asking or do did you want to know something else?

So you used dump/load to create a new repository and then replaced the
old repository with the new repository?  If you did that while Apache
was running, without restarting Apache, then that explains the 'Corrupt
node-revision' error as you changed the data on disk.

What you are left with is some sort of intermittent network problem.  I
don't know what is causing that.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-07 Thread Mojca Miklavec
On Tue, Jan 7, 2014 at 7:34 PM, Philip Martin wrote:

 So you used dump/load to create a new repository and then replaced the
 old repository with the new repository?  If you did that while Apache
 was running, without restarting Apache, then that explains the 'Corrupt
 node-revision' error as you changed the data on disk.

Ah, thanks a lot for explaining that. Yes, I did dump/load the old
repository into a new one because I wanted to test if it would solve
the problem

(on client)
svn: E54: Error retrieving REPORT: Connection reset by peer
(on server)
[dav:error] [pid 3613] [client ip] Unable to deliver content.  [500, #0]
[dav:error] [pid 3613] [client ip] Could not write data to
filter.  [500, #175002]

which it didn't. It only added a few additional problems until I
restarted Apache (I'm sorry for confusing you with those), but the
initial error E54/175002 is still causing problems.

 What you are left with is some sort of intermittent network problem.  I
 don't know what is causing that.

Is there any way to debug that?

Thank you very much,
Mojca


Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-07 Thread Ryan Schmidt

On Jan 7, 2014, at 12:01, Mojca Miklavec mojca.miklavec.li...@gmail.com wrote:

 On Tue, Jan 7, 2014 at 5:47 PM, Philip Martin wrote:
 
 Which version of Apache are you using?  Which Apache MPM are you using?
 
 Server version: Apache/2.4.7 (Unix)
 
 I'm not sure how to check MPM. I get
 
 httpd -l
 Compiled in modules:
  core.c
  mod_so.c
  http_core.c
 
 but httpd -V as suggested on some websites doesn't work. How should
 I check which MPM is being used?

In what way does “httpd -V” not work? On my Mac it gives me the answer (“Server 
MPM: prefork”):

$ httpd -V
Server version: Apache/2.4.7 (Unix)
Server built:   Nov 26 2013 23:32:37
Server's Module Magic Number: 20120211:27
Server loaded:  APR 1.4.8, APR-UTIL 1.5.2
Compiled using: APR 1.4.8, APR-UTIL 1.5.2
Architecture:   64-bit
Server MPM: prefork
  threaded: no
forked: yes (variable process count)
Server compiled with
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_SYSVSEM_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D DYNAMIC_MODULE_LIMIT=256
 -D HTTPD_ROOT=/opt/local
 -D SUEXEC_BIN=/opt/local/bin/suexec
 -D DEFAULT_PIDLOG=var/run/apache2/httpd.pid
 -D DEFAULT_SCOREBOARD=logs/apache_runtime_status
 -D DEFAULT_ERRORLOG=logs/error_log
 -D AP_TYPES_CONFIG_FILE=etc/apache2/mime.types
 -D SERVER_CONFIG_FILE=etc/apache2/httpd.conf



Re: Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-07 Thread Mojca Miklavec
On Tue, Jan 7, 2014 at 8:44 PM, Ryan Schmidt wrote:
 On Jan 7, 2014, at 12:01, Mojca Miklavec wrote:
 On Tue, Jan 7, 2014 at 5:47 PM, Philip Martin wrote:

 Which version of Apache are you using?  Which Apache MPM are you using?

 Server version: Apache/2.4.7 (Unix)

 I'm not sure how to check MPM. I get

 httpd -l
 Compiled in modules:
  core.c
  mod_so.c
  http_core.c

 but httpd -V as suggested on some websites doesn't work. How should
 I check which MPM is being used?

 In what way does “httpd -V” not work?

In this way:

 httpd -V
[time] [so:warn] [pid 63924] AH01574: module dav_svn_module is
already loaded, skipping
[time] [so:warn] [pid 63924] AH01574: module authz_svn_module is
already loaded, skipping
AH00548: NameVirtualHost has no effect and will be removed in the next
release /path/to/00-vhosts.conf:1
(13)Permission denied: AH02291: Cannot access directory
'/path/to/logs/1/' for error log of vhost defined at
/path/to/20-another.conf:4
...
... (repeats a bunch of times)
...
AH00014: Configuration check failed


But I saw the trick now. It wants me to use sudo httpd -V for some
reason, then it works. And yes, it's prefork in my case as well, but
that's probably no longer relevant now that one mystery with forgotten
Apache restart was solved.

(The other problem with Error retrieving REPORT is still a mystery though.)

Mojca


Seeking help for E000054: Error retrieving REPORT: Connection reset by peer

2014-01-04 Thread Mojca Miklavec
Hello,

We have a server running Fedora which has recently been upgraded to
version 20 and it's now running
svn, version 1.8.5 (r1542147)

I have a bunch of repositories served over http protocol with public
read access and limited commit access.

Shortly after the upgrade a weird behaviour has been noticed. Running
svn up on the top level dir worked ok for me, but running
svn co http://svn.myserver.net/myrepo/dirA
fails with

AdirA/subdir1
AdirA/subdir2
AdirA/subdir3
AdirA/subdir4
svn: E54: Error retrieving REPORT: Connection reset by peer

The directory dirA contains one more file FILE.txt. Checking out any
individual subdirN works and the browser is able to display the
contents of dirA.

Trying to click on FILE.txt in the browser sometimes works (it
currently does) and sometimes shows an XML (like a few minutes ago,
but I'm unable to get it now), saying something similar to the error I
get in console***:

svn: E175002: Unable to connect to a repository at URL
'svn.myserver.net/myrepo/dirA'
svn: E175002: Unexpected HTTP status 500 'Internal Server Error' on
'/myrepo/dirA'

svn: E160004: Additional errors:
svn: E160004: Corrupt node-revision '2-1.0.r137/330061'

(*** To be precise: this is the error I get after upgrading the
repository to the latest version of SVN, I didn't try to get to this
error before upgrading.)

The error.log in apache says just:

[date] [dav:error] [pid 3613] [client ip:port] Unable to deliver
content.  [500, #0]
[date] [dav:error] [pid 3613] [client ip:port] Could not write
data to filter.  [500, #175002]

I first tried if upgrading the repository would help in any way, so I did
svnadmin dump oldrepo | svnadmin load newrepo
and checking the relevant revision r137 cited in the error all I see
is the following (nothing unusual):

--- Committed revision 136 

 Started new transaction, based on original revision 137
 * editing path : dirA/FILE.txt ... done.
* Dumped revision 137.
 * editing path : dirA/subdir1/somefile ... done.

--- Committed revision 137 

Checking out the same repository via http on the machine where the
repository itself is located works fine.

I'm using the same version of SVN (1.8.5) on Mac, but other svn
clients on other OSes have problems as well.

I tried checking the repository health with
svnadmin verify /path/to/myrepo
and all revisions passed except for some weird error inbetween (the
file rev-prop-atomics.mutex is actually missing, but it isn't present
in any other repository either):

* Verifying repository metadata ...
* Verifying metadata at revision 1 ...
...
* Verifying metadata at revision 155 ...
svnadmin: E160052: Revprop caching for '/path/to/myrepo/db' disabled
because SHM infrastructure for revprop caching failed to initialize.
svnadmin: E13: Can't open file
'/path/to/myrepo/db/rev-prop-atomics.mutex': Permission denied
* Verified revision 0.
...
* Verified revision 160.


I would appreciate any help or debugging hints. If necessary I can
share the repository URL (but I would prefer to share it off-list to
anyone interested in debugging). I can also try to debug myself, but I
need some instructions telling me what to check. I didn't manage to
find anything useful by googling the errors other than figuring out
that the error was part of the code to fix a memory leak
(http://svn.haxx.se/dev/archive-2009-08/0274.shtml).

Thank you,
Mojca