[ https://issues.apache.org/jira/browse/COUCHDB-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620109#comment-13620109 ]
Wendall Cada commented on COUCHDB-1757: --------------------------------------- My guess is that there is an error in either the change handler or the filter somewhere. When I encountered this type of problem before, the actual error was in the logs, but I kept missing it as it happened well before the _replicator crash. Try searching through the logs a bit further back and see if there is any other errors. > CouchDB 1.3.0rc3 crashes when _replicator contains a lot of docs > ---------------------------------------------------------------- > > Key: COUCHDB-1757 > URL: https://issues.apache.org/jira/browse/COUCHDB-1757 > Project: CouchDB > Issue Type: Bug > Components: Database Core > Reporter: Sander Dijkhuis > > I’m deploying an experimental game based on CouchDB with one user per > database. For access control, I’m using several _replicator docs per user: > - one filtered replication from the shared db to the user db, > - one unfiltered replication from the user db to the shared db, > - two replications using doc_ids per ‘friendship’ (to share both profiles). > At the moment, this results in 420 continuous replications running. CouchDB > 1.3.0rc3 on Ubuntu crashes a couple of seconds after starting, and doesn’t > crash when I temporarily remove the _replicator database. When I used > 1.3.0rc1, CouchDB would crash after a few minutes to a few hours. > Some details from the crash report are below, filtered for privacy, to avoid > repetition and to hide the _design doc that’s shown in the log. Let me know > if you need more detail or if I should share one of the _design functions > used. > Am I abusing the replication system, or can I change a setting to allow for > longer timeouts? > -- > First, I get something like this for each _replicator doc: > {code} > [info] [<0.5368.0>] Replication > `"5529b4bdb9c5bdc15b558bd7588511d9+continuous"` is using: > 4 worker processes > a worker batch size of 500 > 20 HTTP connections > a connection timeout of 30000 milliseconds > 10 retries per request > socket options are: [{keepalive,true},{nodelay,false}] > source start sequence 6908 > [info] [<0.5368.0>] Document `lunacy:to:USERNAME` triggered replication > `5529b4bdb9c5bdc15b558bd7588511d9+continuous` > [info] [<0.1213.0>] starting new replication > `5529b4bdb9c5bdc15b558bd7588511d9+continuous` at <0.5368.0> (`lunacy` -> > `lunacy/user/USERNAME`) > {code} > Then: > {code} > [error] [<0.5408.0>] OS Process died with status: 137 > [error] [<0.5408.0>] ** Generic server <0.5408.0> terminating > ** Last message in was {#Port<0.2740>,{exit_status,137}} > ** When Server state == > {os_proc,"/home/sander/git/apache-couchdb-1.3.0/build/bin/couchjs > /home/sander/git/apache-couchdb-1.3.0/build/share/couchdb/server/main.js", > #Port<0.2740>, > #Fun<couch_os_process.2.132569728>, > #Fun<couch_os_process.3.35601548>,5000} > ** Reason for termination == > ** {exit_status,137} > {code} > Followed by: > {code} > =ERROR REPORT==== 2-Apr-2013::19:18:20 === > ** Generic server <0.5408.0> terminating > ** Last message in was {#Port<0.2740>,{exit_status,137}} > ** When Server state == > {os_proc,"/home/sander/git/apache-couchdb-1.3.0/build/bin/couchjs > /home/sander/git/apache-couchdb-1.3.0/build/share/couchdb/server/main.js", > #Port<0.2740>, > #Fun<couch_os_process.2.132569728>, > #Fun<couch_os_process.3.35601548>,5000} > ** Reason for termination == > ** {exit_status,137} > [error] [<0.5408.0>] {error_report,<0.31.0>, > {<0.5408.0>,crash_report, > [[{initial_call, > {couch_os_process,init,['Argument__1']}}, > {pid,<0.5408.0>}, > {registered_name,[]}, > {error_info, > {exit, > {exit_status,137}, > [{gen_server,terminate,6}, > {proc_lib,init_p_do_apply,3}]}}, > {ancestors, > [couch_query_servers,couch_secondary_services, > couch_server_sup,<0.32.0>]}, > {messages,[]}, > {links,[<0.111.0>,<0.5339.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,running}, > {heap_size,1597}, > {stack_size,24}, > {reductions,1197}], > [{neighbour, > [{pid,<0.5345.0>}, > {registered_name,[]}, > {initial_call, > {couch_event_sup,init,['Argument__1']}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5339.0>]}, > {messages,[]}, > {links,[<0.5339.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}]}, > {neighbour, > [{pid,<0.5339.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5345.0>,<0.5408.0>,<0.5335.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1988}]}]]}} > =CRASH REPORT==== 2-Apr-2013::19:18:21 === > crasher: > initial call: couch_os_process:init/1 > pid: <0.5408.0> > registered_name: [] > exception exit: {exit_status,137} > in function gen_server:terminate/6 > ancestors: [couch_query_servers,couch_secondary_services, > couch_server_sup,<0.32.0>] > messages: [] > links: [<0.111.0>,<0.5339.0>] > dictionary: [] > trap_exit: false > status: running > heap_size: 1597 > stack_size: 24 > reductions: 1197 > neighbours: > neighbour: [{pid,<0.5345.0>}, > {registered_name,[]}, > {initial_call,{couch_event_sup,init,['Argument__1']}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5339.0>]}, > {messages,[]}, > {links,[<0.5339.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}] > neighbour: [{pid,<0.5339.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5345.0>,<0.5408.0>,<0.5335.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1988}] > [error] [<0.5335.0>] ChangesReader process died with reason: {exit_status,137} > [error] [<0.111.0>] OS Process Error <0.5412.0> :: {os_process_error, > "OS process timed out."} > [error] [<0.5387.0>] OS Process died with status: 137 > [error] [<0.5385.0>] OS Process died with status: 137 > [error] [<0.5335.0>] Replication > `f7ecf7f435811899c912619f899f24b4+continuous` (`lunacy` -> > `lunacy/user/USERNAME`) failed: changes_reader_died > [error] [<0.5258.0>] ChangesReader process died with reason: shutdown > [error] [<0.5387.0>] ** Generic server <0.5387.0> terminating > ** Last message in was {#Port<0.2730>,{exit_status,137}} > ** When Server state == > {os_proc,"/home/sander/git/apache-couchdb-1.3.0/build/bin/couchjs > /home/sander/git/apache-couchdb-1.3.0/build/share/couchdb/server/main.js", > #Port<0.2730>, > #Fun<couch_os_process.2.132569728>, > #Fun<couch_os_process.3.35601548>,5000} > ** Reason for termination == > ** {exit_status,137} > =ERROR REPORT==== 2-Apr-2013::19:18:21 === > ** Generic server <0.5387.0> terminating > ** Last message in was {#Port<0.2730>,{exit_status,137}} > ** When Server state == > {os_proc,"/home/sander/git/apache-couchdb-1.3.0/build/bin/couchjs > /home/sander/git/apache-couchdb-1.3.0/build/share/couchdb/server/main.js", > #Port<0.2730>, > #Fun<couch_os_process.2.132569728>, > #Fun<couch_os_process.3.35601548>,5000} > ** Reason for termination == > ** {exit_status,137} > [error] [<0.5385.0>] ** Generic server <0.5385.0> terminating > ** Last message in was {#Port<0.2729>,{exit_status,137}} > ** When Server state == > {os_proc,"/home/sander/git/apache-couchdb-1.3.0/build/bin/couchjs > /home/sander/git/apache-couchdb-1.3.0/build/share/couchdb/server/main.js", > #Port<0.2729>, > #Fun<couch_os_process.2.132569728>, > #Fun<couch_os_process.3.35601548>,5000} > ** Reason for termination == > ** {exit_status,137} > =ERROR REPORT==== 2-Apr-2013::19:18:21 === > ** Generic server <0.5385.0> terminating > ** Last message in was {#Port<0.2729>,{exit_status,137}} > ** When Server state == > {os_proc,"/home/sander/git/apache-couchdb-1.3.0/build/bin/couchjs > /home/sander/git/apache-couchdb-1.3.0/build/share/couchdb/server/main.js", > #Port<0.2729>, > #Fun<couch_os_process.2.132569728>, > #Fun<couch_os_process.3.35601548>,5000} > ** Reason for termination == > ** {exit_status,137} > [error] [<0.5385.0>] {error_report,<0.31.0>, > {<0.5385.0>,crash_report, > [[{initial_call, > {couch_os_process,init,['Argument__1']}}, > {pid,<0.5385.0>}, > {registered_name,[]}, > {error_info, > {exit, > {exit_status,137}, > [{gen_server,terminate,6}, > {proc_lib,init_p_do_apply,3}]}}, > {ancestors, > [couch_query_servers,couch_secondary_services, > couch_server_sup,<0.32.0>]}, > {messages,[]}, > {links,[<0.111.0>,<0.5207.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,running}, > {heap_size,1597}, > {stack_size,24}, > {reductions,1205}], > [{neighbour, > [{pid,<0.5213.0>}, > {registered_name,[]}, > {initial_call, > {couch_event_sup,init,['Argument__1']}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5207.0>]}, > {messages,[]}, > {links,[<0.5207.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}]}, > {neighbour, > [{pid,<0.5207.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5213.0>,<0.5385.0>,<0.5203.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1988}]}]]}} > =CRASH REPORT==== 2-Apr-2013::19:18:22 === > crasher: > initial call: couch_os_process:init/1 > pid: <0.5385.0> > registered_name: [] > exception exit: {exit_status,137} > in function gen_server:terminate/6 > ancestors: [couch_query_servers,couch_secondary_services, > couch_server_sup,<0.32.0>] > messages: [] > links: [<0.111.0>,<0.5207.0>] > dictionary: [] > trap_exit: false > status: running > heap_size: 1597 > stack_size: 24 > reductions: 1205 > neighbours: > neighbour: [{pid,<0.5213.0>}, > {registered_name,[]}, > {initial_call,{couch_event_sup,init,['Argument__1']}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5207.0>]}, > {messages,[]}, > {links,[<0.5207.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}] > neighbour: [{pid,<0.5207.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5213.0>,<0.5385.0>,<0.5203.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1988}] > [error] [<0.5387.0>] {error_report,<0.31.0>, > {<0.5387.0>,crash_report, > [[{initial_call, > {couch_os_process,init,['Argument__1']}}, > {pid,<0.5387.0>}, > {registered_name,[]}, > {error_info, > {exit, > {exit_status,137}, > [{gen_server,terminate,6}, > {proc_lib,init_p_do_apply,3}]}}, > {ancestors, > [couch_query_servers,couch_secondary_services, > couch_server_sup,<0.32.0>]}, > {messages,[]}, > {links,[<0.111.0>,<0.5218.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,running}, > {heap_size,1597}, > {stack_size,24}, > {reductions,1205}], > [{neighbour, > [{pid,<0.5224.0>}, > {registered_name,[]}, > {initial_call, > {couch_event_sup,init,['Argument__1']}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5218.0>]}, > {messages,[]}, > {links,[<0.5218.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}]}, > {neighbour, > [{pid,<0.5218.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5224.0>,<0.5387.0>,<0.5214.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1947}]}]]}} > =CRASH REPORT==== 2-Apr-2013::19:18:24 === > crasher: > initial call: couch_os_process:init/1 > pid: <0.5387.0> > registered_name: [] > exception exit: {exit_status,137} > in function gen_server:terminate/6 > ancestors: [couch_query_servers,couch_secondary_services, > couch_server_sup,<0.32.0>] > messages: [] > links: [<0.111.0>,<0.5218.0>] > dictionary: [] > trap_exit: false > status: running > heap_size: 1597 > stack_size: 24 > reductions: 1205 > neighbours: > neighbour: [{pid,<0.5224.0>}, > {registered_name,[]}, > {initial_call,{couch_event_sup,init,['Argument__1']}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5218.0>]}, > {messages,[]}, > {links,[<0.5218.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}] > neighbour: [{pid,<0.5218.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5224.0>,<0.5387.0>,<0.5214.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1947}] > [error] [<0.5302.0>] ChangesReader process died with reason: shutdown > [error] [<0.5192.0>] ChangesReader process died with reason: shutdown > [error] [<0.5203.0>] ChangesReader process died with reason: {exit_status,137} > [error] [<0.5214.0>] ChangesReader process died with reason: {exit_status,137} > [error] [<0.3692.0>] ChangesReader process died with reason: shutdown > [error] [<0.5258.0>] Replication > `3d6539a2a9e3201a6eacd0b7db4c7dd3+continuous` (`lunacy` -> > `lunacy/user/USERNAME`) failed: changes_reader_died > [error] [<0.5170.0>] ChangesReader process died with reason: shutdown > [error] [<0.5236.0>] ChangesReader process died with reason: shutdown > [error] [<0.5280.0>] ChangesReader process died with reason: shutdown > [error] [<0.5225.0>] ChangesReader process died with reason: shutdown > [error] [<0.5324.0>] ChangesReader process died with reason: shutdown > [error] [<0.5291.0>] ChangesReader process died with reason: shutdown > [error] [<0.5313.0>] ChangesReader process died with reason: shutdown > [error] [<0.5181.0>] ChangesReader process died with reason: shutdown > [error] [<0.5269.0>] ChangesReader process died with reason: shutdown > [error] [<0.111.0>] ** Generic server couch_query_servers terminating > ** Last message in was {get_proc,{doc,<<"_design/server">>, > {31, > > [<<2,129,73,127,145,177,85,156,51,70,79, > 122,210,226,20,220>>, (ET CETERA) > [],false,[]}, > {<<"_design/server">>, > <<"31-0281497f91b1559c33464f7ad2e214dc">>}} > ** When Server state == {qserver,32811,41005,45102,36908,[], > {[{<<"reduce_limit">>,true}, > {<<"timeout">>,5000}]}} > ** Reason for termination == > ** {bad_return_value,{os_process_error,"OS process timed out."}} > {code} > And finally: > {code} > {'$gen_call', > {<0.3696.0>,#Ref<0.0.0.31225>}, > {unlink_proc,<0.3714.0>}}, > {'$gen_call', > {<0.5174.0>,#Ref<0.0.0.31231>}, > {unlink_proc,<0.5379.0>}}, > {'$gen_call', > {<0.5185.0>,#Ref<0.0.0.31237>}, > {unlink_proc,<0.5381.0>}}, > {'$gen_call', > {<0.5196.0>,#Ref<0.0.0.31243>}, > {unlink_proc,<0.5383.0>}}, > {'$gen_call', > {<0.5207.0>,#Ref<0.0.0.31249>}, > {unlink_proc,<0.5385.0>}}, > {'$gen_call', > {<0.5218.0>,#Ref<0.0.0.31255>}, > {unlink_proc,<0.5387.0>}}, > {'$gen_call', > {<0.5229.0>,#Ref<0.0.0.31261>}, > {unlink_proc,<0.5389.0>}}, > {'$gen_call', > {<0.5240.0>,#Ref<0.0.0.31267>}, > {unlink_proc,<0.5391.0>}}, > {'$gen_call', > {<0.5262.0>,#Ref<0.0.0.31273>}, > {unlink_proc,<0.5393.0>}}, > {'$gen_call', > {<0.5273.0>,#Ref<0.0.0.31299>}, > {unlink_proc,<0.5395.0>}}, > {'$gen_call', > {<0.5284.0>,#Ref<0.0.0.31305>}, > {unlink_proc,<0.5398.0>}}, > {'$gen_call', > {<0.5295.0>,#Ref<0.0.0.31311>}, > {unlink_proc,<0.5400.0>}}, > {'$gen_call', > {<0.5306.0>,#Ref<0.0.0.31317>}, > {unlink_proc,<0.5402.0>}}, > {'$gen_call', > {<0.5317.0>,#Ref<0.0.0.31323>}, > {unlink_proc,<0.5404.0>}}, > {'$gen_call', > {<0.5328.0>,#Ref<0.0.0.31329>}, > {unlink_proc,<0.5406.0>}}, > {'$gen_call', > {<0.5339.0>,#Ref<0.0.0.31359>}, > {unlink_proc,<0.5408.0>}}, > {'EXIT',<0.5408.0>,{exit_status,137}}, > {'DOWN',#Ref<0.0.0.31331>,process,<0.5408.0>, > {exit_status,137}}, > {'EXIT',<0.5412.0>,normal}, > > {'DOWN',#Ref<0.0.0.31360>,process,<0.5412.0>,normal}, > {'DOWN',#Ref<0.0.0.31269>,process,<0.5393.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.21467>,process,<0.3714.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31313>,process,<0.5402.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31239>,process,<0.5383.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31245>,process,<0.5385.0>, > {exit_status,137}}, > {'EXIT',<0.5387.0>,{exit_status,137}}, > {'DOWN',#Ref<0.0.0.31251>,process,<0.5387.0>, > {exit_status,137}}, > {'DOWN',#Ref<0.0.0.31227>,process,<0.5379.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31263>,process,<0.5391.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31257>,process,<0.5389.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31301>,process,<0.5398.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31325>,process,<0.5406.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31319>,process,<0.5404.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31307>,process,<0.5400.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31233>,process,<0.5381.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31275>,process,<0.5395.0>, > shutdown}]}, > {links,[<0.94.0>]}, > {dictionary,[]}, > {trap_exit,true}, > {status,running}, > {heap_size,17711}, > {stack_size,24}, > {reductions,7801}], > []]}} > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira