Yep, that seems reasonable. Let me know when I can test again. :)

On Mon, Jan 23, 2012 at 1:34 AM, Filipe David Manana <fdman...@apache.org>wrote:

> Noah, was able to reproduce your issue by tweaking the test to create
> more leaf revisions for a document:
>
> diff --git a/test/etap/242-replication-many-leaves.t
> b/test/etap/242-replication-many-leaves.t
> index d8d3eb9..4eb4765 100755
> --- a/test/etap/242-replication-many-leaves.t
> +++ b/test/etap/242-replication-many-leaves.t
> @@ -56,7 +56,7 @@ doc_ids() ->
>
>  doc_num_conflicts(<<"doc1">>) -> 10;
>  doc_num_conflicts(<<"doc2">>) -> 100;
> -doc_num_conflicts(<<"doc3">>) -> 286.
> +doc_num_conflicts(<<"doc3">>) -> 500.
>
>
>  main(_) ->
>
>
> With that change, I get exactly the same timeout as you get when the
> test runs the push replication. It turns out that some _bulk_docs
> requests are taking more than 30 seconds (default replication
> connection timeout) therefore the replication request retry messages.
> Verified this by timing the _bulk_docs handler to log the time it
> takes:
>
> diff --git a/src/couchdb/couch_httpd_db.erl
> b/src/couchdb/couch_httpd_db.erl
> index d7ecb4a..442571d 100644
> --- a/src/couchdb/couch_httpd_db.erl
> +++ b/src/couchdb/couch_httpd_db.erl
> @@ -297,6 +297,7 @@
> db_req(#httpd{path_parts=[_,<<"_ensure_full_commit">>]}=Req, _Db) ->
>     send_method_not_allowed(Req, "POST");
>
>  db_req(#httpd{method='POST',path_parts=[_,<<"_bulk_docs">>]}=Req, Db) ->
> +    T0 = now(),
>     couch_stats_collector:increment({httpd, bulk_requests}),
>     couch_httpd:validate_ctype(Req, "application/json"),
>     {JsonProps} = couch_httpd:json_body_obj(Req),
> @@ -357,7 +358,9 @@
> db_req(#httpd{method='POST',path_parts=[_,<<"_bulk_docs">>]}=Req, Db)
> ->
>             {ok, Errors} = couch_db:update_docs(Db, Docs, Options,
> replicated_changes),
>             ErrorsJson =
>                 lists:map(fun update_doc_result_to_json/1, Errors),
> -            send_json(Req, 201, ErrorsJson)
> +            Rr = send_json(Req, 201, ErrorsJson),
> +            ?LOG_ERROR("BULK DOCS took ~p ms~n",
> [timer:now_diff(now(), T0) / 1000]),
> +            Rr
>         end
>     end;
>  db_req(#httpd{path_parts=[_,<<"_bulk_docs">>]}=Req, _Db) ->
>
>
> I was getting _bulk_docs response times after 50 seconds.
>
> This convinces me there's nothing wrong with the codebase, the
> timeouts just needs to be increased:
>
> diff --git a/test/etap/242-replication-many-leaves.t
> b/test/etap/242-replication-many-leaves.t
> index d8d3eb9..6508112 100755
> --- a/test/etap/242-replication-many-leaves.t
> +++ b/test/etap/242-replication-many-leaves.t
> @@ -77,6 +77,7 @@ test() ->
>     couch_server_sup:start_link(test_util:config_files()),
>     ibrowse:start(),
>     crypto:start(),
> +    couch_config:set("replicator", "connection_timeout", "90000", false),
>
>     Pairs = [
>         {source_db_name(), target_db_name()},
> @@ -287,6 +288,6 @@ replicate(Source, Target) ->
>      receive
>     {'DOWN', MonRef, process, Pid, Reason} ->
>         etap:is(Reason, normal, "Replication finished successfully")
> -    after 300000 ->
> +    after 900000 ->
>         etap:bail("Timeout waiting for replication to finish")
>     end.
>
> Alternatively the test can be updated to create less revisions for the
> document doc3. The current revisions # is 286 but for the tests'
> purpose 205+ is enough, which should make it faster - 7000 (max url
> length) / length(DocRevision) = 205
>
> If it's ok for you, updating the timeouts plus reducing the # from 286
> to 210 is fine for me.
>
>
>
> On Mon, Jan 23, 2012 at 12:00 AM, Noah Slater <nsla...@tumbolia.org>
> wrote:
> > I'm just the dumb QA guy.
> >
> > If you have some diagnostics you want me to run on my machine, I am happy
> > to.
> >
> > On Sun, Jan 22, 2012 at 11:31 PM, Filipe David Manana
> > <fdman...@apache.org>wrote:
> >
> >> On Sun, Jan 22, 2012 at 7:20 PM, Noah Slater <nsla...@tumbolia.org>
> wrote:
> >> > Works. How do we proceed?
> >>
> >> For how much time does the test runs? On 2 different physical
> >> machines, it takes about 1 minute and 10 seconds for me.
> >>
> >> Perhaps some manual replication tests could confirm if there's
> >> something wrong with the codebase, your environment or if simply
> >> increasing the timeout is not alarming.
> >>
> >> >
> >> > On Sun, Jan 22, 2012 at 7:05 PM, Noah Slater <nsla...@tumbolia.org>
> >> wrote:
> >> >
> >> >> OVAR 9000! (Testing now...)
> >> >>
> >> >>
> >> >> On Sun, Jan 22, 2012 at 6:56 PM, Filipe David Manana <
> >> fdman...@apache.org>wrote:
> >> >>
> >> >>> On Sun, Jan 22, 2012 at 6:47 PM, Noah Slater <nsla...@tumbolia.org>
> >> >>> wrote:
> >> >>> > No change, still fails.
> >> >>>
> >> >>> Noah, to try to find out if it's due to slowness of the machine or
> >> >>> some other issue, do you think you can try to increase the following
> >> >>> timeout in the test?
> >> >>>
> >> >>> diff --git a/test/etap/242-replication-many-leaves.t
> >> >>> b/test/etap/242-replication-many-leaves.t
> >> >>> index d8d3eb9..737cd31 100755
> >> >>> --- a/test/etap/242-replication-many-leaves.t
> >> >>> +++ b/test/etap/242-replication-many-leaves.t
> >> >>> @@ -287,6 +287,6 @@ replicate(Source, Target) ->
> >> >>>     receive
> >> >>>     {'DOWN', MonRef, process, Pid, Reason} ->
> >> >>>         etap:is(Reason, normal, "Replication finished successfully")
> >> >>> -    after 300000 ->
> >> >>> +    after 900000 ->
> >> >>>         etap:bail("Timeout waiting for replication to finish")
> >> >>>     end.
> >> >>>
> >> >>> >
> >> >>> > On Sun, Jan 22, 2012 at 6:08 PM, Noah Slater <
> nsla...@tumbolia.org>
> >> >>> wrote:
> >> >>> >
> >> >>> >>
> >> >>> >> On Sun, Jan 22, 2012 at 6:01 PM, Filipe David Manana <
> >> >>> fdman...@apache.org>wrote:
> >> >>> >>
> >> >>> >>> Noah, does it fail occasionally or every time for you?
> >> >>> >>>
> >> >>> >>
> >> >>> >> Fails every time.
> >> >>> >>
> >> >>> >>
> >> >>> >>> I'm assuming you're with a slow machine or the machine is a bit
> >> >>> >>> overloaded.
> >> >>> >>>
> >> >>> >>
> >> >>> >> Shouldn't be, I'm not doing anything else right now, and this is
> a
> >> new
> >> >>> MBA.
> >> >>> >>
> >> >>> >>
> >> >>> >>> Can you try with the following patch?
> >> >>> >>>
> >> >>> >>
> >> >>> >> Yes. Will report back.
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Filipe David Manana,
> >> >>>
> >> >>> "Reasonable men adapt themselves to the world.
> >> >>>  Unreasonable men adapt the world to themselves.
> >> >>>  That's why all progress depends on unreasonable men."
> >> >>>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> Filipe David Manana,
> >>
> >> "Reasonable men adapt themselves to the world.
> >>  Unreasonable men adapt the world to themselves.
> >>  That's why all progress depends on unreasonable men."
> >>
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
>

Reply via email to