[ https://issues.apache.org/jira/browse/COUCHDB-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gunther Gruber updated COUCHDB-2484: ------------------------------------ Affects Version/s: 1.6.0 > replication crashes > ------------------- > > Key: COUCHDB-2484 > URL: https://issues.apache.org/jira/browse/COUCHDB-2484 > Project: CouchDB > Issue Type: Bug > Security Level: public(Regular issues) > Components: Database Core > Affects Versions: 1.6.0 > Reporter: Gunther Gruber > Priority: Minor > > We are Using Couchdb Version 1.6 with 8.3T of data, biggest Database ist > 2.1T. At this moment we switch to new hardware with more storage space. We > copied the files with rsync and started the replication. > One system is already in sync, the other is doing the replication. > I appreciate that besides the errors in the log, the first system is now in > sync. > The log looks like the following > Retrying POST request to http://replication:XXXX/database/_revs_diff in 0.5 > seconds due to error req_timedout > and then > Mon, 01 Dec 2014 13:00:28 GMT] [error] [<0.27044.1>] ** Generic server > <0.27044.1> terminating > ** Last message in was {'EXIT',<0.26965.1>,killed} > ** When Server state == {state,<0.26965.1>,<0.27045.1>,40, > {httpdb, > "http://replication:XXX@XXX.5984/sm_chemie/", > nil, > [{"Accept","application/json"}, > {"User-Agent","CouchDB/1.2.0"}], > 30000, > [{socket_options, > [{recbuf,262144}, > {sndbuf,262144}, > {nodelay,true}, > {keepalive,true}]}], > 10,250,<0.26966.1>,40}, > {httpdb, > "http://replication:XXX@XXX:5984/sm_chemie/", > nil, > [{"Accept","application/json"}, > {"User-Agent","CouchDB/1.2.0"}], > 30000, > [{socket_options, > [{recbuf,262144}, > {sndbuf,262144}, > {nodelay,true}, > {keepalive,true}]}], > 10,250,<0.26968.1>,40}, > [],nil,nil,nil, > {rep_stats,0,0,0,0,0}, > nil,nil, > {batch,[],0}} > ** Reason for termination == > ** killed > [Mon, 01 Dec 2014 13:00:28 GMT] [error] [<0.27042.1>] {error_report,<0.31.0>, > {<0.27042.1>,crash_report, > [[{initial_call, > {couch_replicator_worker,init,['Argument__1']}}, > {pid,<0.27042.1>}, > {registered_name,[]}, > {error_info, > {exit,killed, > [{gen_server,terminate,6, > [{file,"gen_server.erl"},{line,747}]}, > {proc_lib,init_p_do_apply,3, > [{file,"proc_lib.erl"},{line,227}]}]}}, > {ancestors, > [<0.26965.1>,couch_rep_sup,couch_primary_services, > couch_server_sup,<0.32.0>]}, > {messages,[]}, > {links,[<0.27043.1>]}, > {dictionary, > [{last_stats_report,{1417,438797,704976}}]}, > {trap_exit,true}, > {status,running}, > {heap_size,377}, > {stack_size,24}, > {reductions,372}], > []]}} > It seems to me like a timeout and the replication task then exits. I allready > played arround with the configuration setting with no succes. I can provide > more information if needed. > /etc/couchdb/local.d/001-user_config.ini > [couchdb] > file_compression = snappy > max_dbs_open = 400 > [httpd] > bind_address = :: > server_options = [{backlog, 128}, {acceptor_pool_size, 16}] > socket_options = [{recbuf, 262144}, {sndbuf, 262144}, {nodelay, true}, > {keepalive, true}] > [couch_httpd_auth] > secret = > [log_level_by_module] > couch_httpd = warning > couch_replicator = debug > couch_query_servers = warning > [daemons] > httpsd = {couch_httpd, start_link, [https]} > [ssl] > cert_file = /etc/couchdb/ssl/certs/couchdb-couch1.prime.adns.de.pem > key_file = /etc/couchdb/ssl/private/couchdb-couch1.prime.adns.de.pem > verify_ssl_certificates = false > [replicator] > worker_batch_size = 2000 > worker_processes = 40 > http_connections = 40 > socket_options = [{recbuf, 262144}, {sndbuf, 262144}, {nodelay, true}, > {keepalive, true}] > /etc/default/couchdb > # Sourced by init script for configuration. > COUCHDB_USER=couchdb > COUCHDB_STDOUT_FILE=/dev/null > COUCHDB_STDERR_FILE=/dev/null > COUCHDB_RESPAWN_TIMEOUT=5 > COUCHDB_OPTIONS= > # 32 Threads to handle I/O > export ERL_FLAGS="+A 32" > # 8192 open files > export ERL_MAX_PORTS=8192 > ulimit -n 8192 -- This message was sent by Atlassian JIRA (v6.3.4#6332)