Yep, that seems reasonable. Let me know when I can test again. :) On Mon, Jan 23, 2012 at 1:34 AM, Filipe David Manana <fdman...@apache.org>wrote:
> Noah, was able to reproduce your issue by tweaking the test to create > more leaf revisions for a document: > > diff --git a/test/etap/242-replication-many-leaves.t > b/test/etap/242-replication-many-leaves.t > index d8d3eb9..4eb4765 100755 > --- a/test/etap/242-replication-many-leaves.t > +++ b/test/etap/242-replication-many-leaves.t > @@ -56,7 +56,7 @@ doc_ids() -> > > doc_num_conflicts(<<"doc1">>) -> 10; > doc_num_conflicts(<<"doc2">>) -> 100; > -doc_num_conflicts(<<"doc3">>) -> 286. > +doc_num_conflicts(<<"doc3">>) -> 500. > > > main(_) -> > > > With that change, I get exactly the same timeout as you get when the > test runs the push replication. It turns out that some _bulk_docs > requests are taking more than 30 seconds (default replication > connection timeout) therefore the replication request retry messages. > Verified this by timing the _bulk_docs handler to log the time it > takes: > > diff --git a/src/couchdb/couch_httpd_db.erl > b/src/couchdb/couch_httpd_db.erl > index d7ecb4a..442571d 100644 > --- a/src/couchdb/couch_httpd_db.erl > +++ b/src/couchdb/couch_httpd_db.erl > @@ -297,6 +297,7 @@ > db_req(#httpd{path_parts=[_,<<"_ensure_full_commit">>]}=Req, _Db) -> > send_method_not_allowed(Req, "POST"); > > db_req(#httpd{method='POST',path_parts=[_,<<"_bulk_docs">>]}=Req, Db) -> > + T0 = now(), > couch_stats_collector:increment({httpd, bulk_requests}), > couch_httpd:validate_ctype(Req, "application/json"), > {JsonProps} = couch_httpd:json_body_obj(Req), > @@ -357,7 +358,9 @@ > db_req(#httpd{method='POST',path_parts=[_,<<"_bulk_docs">>]}=Req, Db) > -> > {ok, Errors} = couch_db:update_docs(Db, Docs, Options, > replicated_changes), > ErrorsJson = > lists:map(fun update_doc_result_to_json/1, Errors), > - send_json(Req, 201, ErrorsJson) > + Rr = send_json(Req, 201, ErrorsJson), > + ?LOG_ERROR("BULK DOCS took ~p ms~n", > [timer:now_diff(now(), T0) / 1000]), > + Rr > end > end; > db_req(#httpd{path_parts=[_,<<"_bulk_docs">>]}=Req, _Db) -> > > > I was getting _bulk_docs response times after 50 seconds. > > This convinces me there's nothing wrong with the codebase, the > timeouts just needs to be increased: > > diff --git a/test/etap/242-replication-many-leaves.t > b/test/etap/242-replication-many-leaves.t > index d8d3eb9..6508112 100755 > --- a/test/etap/242-replication-many-leaves.t > +++ b/test/etap/242-replication-many-leaves.t > @@ -77,6 +77,7 @@ test() -> > couch_server_sup:start_link(test_util:config_files()), > ibrowse:start(), > crypto:start(), > + couch_config:set("replicator", "connection_timeout", "90000", false), > > Pairs = [ > {source_db_name(), target_db_name()}, > @@ -287,6 +288,6 @@ replicate(Source, Target) -> > receive > {'DOWN', MonRef, process, Pid, Reason} -> > etap:is(Reason, normal, "Replication finished successfully") > - after 300000 -> > + after 900000 -> > etap:bail("Timeout waiting for replication to finish") > end. > > Alternatively the test can be updated to create less revisions for the > document doc3. The current revisions # is 286 but for the tests' > purpose 205+ is enough, which should make it faster - 7000 (max url > length) / length(DocRevision) = 205 > > If it's ok for you, updating the timeouts plus reducing the # from 286 > to 210 is fine for me. > > > > On Mon, Jan 23, 2012 at 12:00 AM, Noah Slater <nsla...@tumbolia.org> > wrote: > > I'm just the dumb QA guy. > > > > If you have some diagnostics you want me to run on my machine, I am happy > > to. > > > > On Sun, Jan 22, 2012 at 11:31 PM, Filipe David Manana > > <fdman...@apache.org>wrote: > > > >> On Sun, Jan 22, 2012 at 7:20 PM, Noah Slater <nsla...@tumbolia.org> > wrote: > >> > Works. How do we proceed? > >> > >> For how much time does the test runs? On 2 different physical > >> machines, it takes about 1 minute and 10 seconds for me. > >> > >> Perhaps some manual replication tests could confirm if there's > >> something wrong with the codebase, your environment or if simply > >> increasing the timeout is not alarming. > >> > >> > > >> > On Sun, Jan 22, 2012 at 7:05 PM, Noah Slater <nsla...@tumbolia.org> > >> wrote: > >> > > >> >> OVAR 9000! (Testing now...) > >> >> > >> >> > >> >> On Sun, Jan 22, 2012 at 6:56 PM, Filipe David Manana < > >> fdman...@apache.org>wrote: > >> >> > >> >>> On Sun, Jan 22, 2012 at 6:47 PM, Noah Slater <nsla...@tumbolia.org> > >> >>> wrote: > >> >>> > No change, still fails. > >> >>> > >> >>> Noah, to try to find out if it's due to slowness of the machine or > >> >>> some other issue, do you think you can try to increase the following > >> >>> timeout in the test? > >> >>> > >> >>> diff --git a/test/etap/242-replication-many-leaves.t > >> >>> b/test/etap/242-replication-many-leaves.t > >> >>> index d8d3eb9..737cd31 100755 > >> >>> --- a/test/etap/242-replication-many-leaves.t > >> >>> +++ b/test/etap/242-replication-many-leaves.t > >> >>> @@ -287,6 +287,6 @@ replicate(Source, Target) -> > >> >>> receive > >> >>> {'DOWN', MonRef, process, Pid, Reason} -> > >> >>> etap:is(Reason, normal, "Replication finished successfully") > >> >>> - after 300000 -> > >> >>> + after 900000 -> > >> >>> etap:bail("Timeout waiting for replication to finish") > >> >>> end. > >> >>> > >> >>> > > >> >>> > On Sun, Jan 22, 2012 at 6:08 PM, Noah Slater < > nsla...@tumbolia.org> > >> >>> wrote: > >> >>> > > >> >>> >> > >> >>> >> On Sun, Jan 22, 2012 at 6:01 PM, Filipe David Manana < > >> >>> fdman...@apache.org>wrote: > >> >>> >> > >> >>> >>> Noah, does it fail occasionally or every time for you? > >> >>> >>> > >> >>> >> > >> >>> >> Fails every time. > >> >>> >> > >> >>> >> > >> >>> >>> I'm assuming you're with a slow machine or the machine is a bit > >> >>> >>> overloaded. > >> >>> >>> > >> >>> >> > >> >>> >> Shouldn't be, I'm not doing anything else right now, and this is > a > >> new > >> >>> MBA. > >> >>> >> > >> >>> >> > >> >>> >>> Can you try with the following patch? > >> >>> >>> > >> >>> >> > >> >>> >> Yes. Will report back. > >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> Filipe David Manana, > >> >>> > >> >>> "Reasonable men adapt themselves to the world. > >> >>> Unreasonable men adapt the world to themselves. > >> >>> That's why all progress depends on unreasonable men." > >> >>> > >> >> > >> >> > >> > >> > >> > >> -- > >> Filipe David Manana, > >> > >> "Reasonable men adapt themselves to the world. > >> Unreasonable men adapt the world to themselves. > >> That's why all progress depends on unreasonable men." > >> > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." >