[ 
https://issues.apache.org/jira/browse/COUCHDB-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037578#comment-13037578
 ] 

Filipe Manana commented on COUCHDB-1174:
----------------------------------------

I've run your script multiple times and I can't reproduce it. I think your 
machine faster and therefore it's why you're able to have this issue.
Doing an analysis, I see 3 possible causes:

1) When sending multipart requests to the target, we're sending an extra ending 
boundary (--some_uuid--) for some cases; For this script, all the 
documents/attachments are exactly the same, so if there were such a bug, it 
would be triggered for any document (except for the first one sent);

2) On the receiving side, the multipart parser finishes successfully before 
receiving the ending boundary marker for this particular case, and so the 
ending boundary will be part of the beginning of the next request that goes in 
the same connection;

3) Mochiweb sets the SO_REUSEADDR in the listen socket. Sockets resulting from 
the 'accept' call, inherit the same options as the listening socket. It might 
be that a socket is reused and it has some garbage left from a previous request.

For 1) and 2), since I can't reproduce this issue on 2 different machines, I've 
only done code analysis, mostly in couch_doc (multipart sender) and couch_httpd 
(multipart parser) and so far I can't find anything wrong with them.

Robert, do you think you can try the following mochiweb patch:

diff --git a/src/mochiweb/mochiweb_socket_server.erl 
b/src/mochiweb/mochiweb_socket_server.erl
index ff0d8f3..4c605a0 100644
--- a/src/mochiweb/mochiweb_socket_server.erl
+++ b/src/mochiweb/mochiweb_socket_server.erl
@@ -148,7 +148,7 @@ ipv6_supported() ->
 init(State=#mochiweb_socket_server{ip=Ip, port=Port, backlog=Backlog, 
nodelay=NoDelay}) ->
     process_flag(trap_exit, true),
     BaseOpts = [binary,
-                {reuseaddr, true},
+                {reuseaddr, false},
                 {packet, 0},
                 {backlog, Backlog},
                 {recbuf, ?RECBUF_SIZE},

Adding the nodelay option to the replication socket_options might reduce the 
frequency of this issue. Just add an extra field to replication objects:

"socket_options": "[{nodelay, true}, {keepalive, true}]"

Anyway, these are mostly speculations, but I believe the reuseaddr option has a 
small chance of being a problem here.

cheers

> Multipart parsing bug in new replicator
> ---------------------------------------
>
>                 Key: COUCHDB-1174
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1174
>             Project: CouchDB
>          Issue Type: Bug
>    Affects Versions: 1.2
>            Reporter: Robert Newson
>            Priority: Blocker
>         Attachments: COUCHDB-1174.sh
>
>
> It seems the new multipart savvy replicator has a bug. At high load, the 
> receiving node sees the following as the method of a new http request;
>  "--17481297448f5a282cc919203957ebd9--POST"
> instead of just "POST". The first bit looks like a multipart boundary value 
> to me.
> I'll attach a script that reproduces the error now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to