Wanted to forward this issue on, for comment. I see this quite a lot in our issue tracker these days.

IBM folks who have more experience with CouchDB in Docker, do you see this happening? Because I certainly do. And, at this point, my knee-jerk response is "pre-FDB CouchDB works terribly in Docker unless you really, really know and trust your setup, and you probably don't actually know your setup, because you're using a cloud provider that has *terrible* limits on I/O or disk they refuse to publish, and you'll get screwed by them, and they won't help you understand that's the issue, so there's very little you can do to identify the source of the problem. So run it without Docker instead." Which isn't very helpful, but it's gotten a lot of (paying) clients out of scrapes.


I would love it if someone with a more positive experience with CouchDB and Docker can reply to this issue (and me) and provide some advice for the community in general.


Thanks,

Joan "when you have a hammer, everything looks like a nail" Touzet



-------- Forwarded Message --------
Subject: [apache/couchdb] Couchdb stops writting logs and and then after some time we have always timeouts (#3083)
Date:   Mon, 17 Aug 2020 14:40:36 -0700
From:   raulmartinezr <notificati...@github.com>
Reply-To: apache/couchdb <reply+aaa3njauzednyf6bq3oxmh55i3nfjevbnhhcrec...@reply.github.com>
To:     apache/couchdb <couc...@noreply.github.com>
CC:     Subscribed <subscri...@noreply.github.com>



   Description

We use *dockerized couchdb* in local development environment to *test applications*.
The procedure for each test:

 * create a database in couchdb
 * test whatever we need to test inside this DB
 * delete it in case it's not necessary anymore

We have *aprox 70 tests*

For each database a few documents are created

 * *less than 10 documents* usually (max 50 when we test indexes
   created by ddoc)
 * *2 design documents,* one with two mango indexes and the other one
   with 1 mango index

If we execute tests module by module (each module contain between 1 and 10 tests ) there is not any issue, but if we execute all at the same time (sequentially, without parallelism), then at some point tests hangs

*We observed*

 * Couchdb stops writting logs (CPU consumption decreases)
 * We still have responses to quieries for some time, 1min aprox
 * Then we have not responses from couchdb anymore

*Example* (It's not always the same)

 * Last logs written (querying a view with /master/_partition/iam/_find
   ). After that CPU consumption of couchdb goes down.

[debug] 2020-08-17T19:30:05.663402Z nonode@nohost<0.20917.0>  a13fe062bc no 
record of user admin
[debug] 2020-08-17T19:30:05.663457Z nonode@nohost<0.20917.0>  a13fe062bc 
timeout 600
[debug] 2020-08-17T19:30:05.663495Z nonode@nohost<0.20917.0>  a13fe062bc Successful 
cookie auth as:"admin"
[notice] 2020-08-17T19:30:05.665187Z nonode@nohost<0.20917.0>  a13fe062bc 
127.0.0.1:5984 172.28.0.1 admin POST /master/_partition/iam/_find 200 ok 2
[debug] 2020-08-17T19:30:05.677477Z nonode@nohost<0.20917.0>  be296403e4 no 
record of user admin
[debug] 2020-08-17T19:30:05.677539Z nonode@nohost<0.20917.0>  be296403e4 
timeout 600
[debug] 2020-08-17T19:30:05.677570Z nonode@nohost<0.20917.0>  be296403e4 Successful 
cookie auth as:"admin"

 *

   Request captured with tcpdump
   image
   
<https://user-images.githubusercontent.com/4292375/90440621-d4bd7d00-e0d7-11ea-9d9a-6710dc422e5f.png>

 *

   After some time (1min 9s), timeouts
   image
   
<https://user-images.githubusercontent.com/4292375/90440906-4bf31100-e0d8-11ea-8d54-40ebb6035337.png>

*HW and environment*

 * Host machine runs Ubuntu 20.04, with 8cores and 16Gb RAM and 512GB
   SSD (250free)
 * Docker container has not any limitation in CPU/Memory/Space
 * Couchdb configured as single node (full configuration below)

This is how the container processes looks. Earlang (beam.smp) is the most consuming, with peaks of 70% CPU image <https://user-images.githubusercontent.com/4292375/90441456-4649fb00-e0d9-11ea-9801-a8aae75c6cf4.png>

Theads of beam.smp: Just when issue happens
I monitored threads during the whole process. Just before the crash, it seems scheduler threads increase the activity

top - 20:55:05 up  2:58,  0 users,  load average: 3.93, 2.31, 2.35
Threads:  46 total,   0 running,  46 sleeping,   0 stopped,   0 zombie
%Cpu(s): 46.2 us,  4.3 sy,  0.0 ni, 49.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem:   15687.8 total,   1947.6 free,   7294.3 used,   6445.9 buff/cache
MiB Swap:    980.0 total,    980.0 free,      0.0 used.   7200.1 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     54 couchdb   20   0 4568544  62092  10904 S  40.0   0.4   0:07.25 
2_scheduler
     53 couchdb   20   0 4568544  62092  10904 S  33.3   0.4   0:20.74 
1_scheduler
     55 couchdb   20   0 4568544  62092  10904 S  33.3   0.4   0:06.63 
3_scheduler
     56 couchdb   20   0 4568544  62092  10904 S  20.0   0.4   0:05.66 
4_scheduler
     37 couchdb   20   0 4568544  62092  10904 S   6.7   0.4   0:00.03 async_2
      6 couchdb   20   0 4568544  62092  10904 S   0.0   0.4   0:00.03 beam.smp
     34 couchdb   20   0 4568544  62092  10904 S   0.0   0.4   0:00.00 
sys_sig_dispatc
     35 couchdb   20   0 4568544  62092  10904 S   0.0   0.4   0:00.00 
sys_msg_dispatc
     36 couchdb   20   0 4568544  62092  10904 S   0.0   0.4   0:00.18 async_1
     38 couchdb   20   0 4568544  62092  10904 S   0.0   0.4   0:00.14 async_3
.....

*Any idea about what could be the cause? Any hint would be appreaciated.*


   Steps to Reproduce

There is not a fixed trigger for the issue.


   Expected Behaviour

We would expect couchdb can handle this load even with docker. It's not heavy, during tests we have 15operations per second max.


   Your Environment

{"couchdb":"Welcome","version":"3.1.0","git_sha":"ff0feea20","uuid":"b17edbd1de7d1504022d6f359ff9a4f8","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The
 Apache Software Foundation"}}

 * CouchDB version used: 3.1.0
 * Browser name and version: Not relevant
 * Operating system and version: Couchdb docker@Ubuntu20.04


   Additional Context

*Full couchdb configuration*

Configuration Settings:
  [admins] admin="******"
  [attachments] compressible_types="text/*, application/javascript, 
application/json, application/xml"
  [attachments] compression_level="8"
  [chttpd] backlog="512"
  [chttpd] bind_address="any"
  [chttpd] max_db_number_for_dbs_info_req="100"
  [chttpd] port="5984"
[chttpd] prefer_minimal="Cache-Control, Content-Length, Content-Range, Content-Type, ETag, Server, Transfer-Encoding, Vary"
  [chttpd] require_valid_user="false"
  [chttpd] server_options="[{backlog, 512}, {acceptor_pool_size, 64}, {max, 
4096}]"
  [chttpd] socket_options="[{sndbuf, 262144}, {nodelay, true}]"
  [cluster] n="3"
  [cluster] q="2"
  [cors] credentials="false"
  [couch_httpd_auth] allow_persistent_cookies="true"
  [couch_httpd_auth] auth_cache_size="50"
  [couch_httpd_auth] authentication_db="_users"
  [couch_httpd_auth] authentication_redirect="/_utils/session.html"
  [couch_httpd_auth] iterations="10"
  [couch_httpd_auth] require_valid_user="false"
  [couch_httpd_auth] secret="00464db7ba8beb6a5915e4f5dbd03a49"
  [couch_httpd_auth] timeout="600"
  [couch_peruser] database_prefix="userdb-"
  [couch_peruser] delete_dbs="false"
  [couch_peruser] enable="false"
  [couchdb] attachment_stream_buffer_size="4096"
  [couchdb] changes_doc_ids_optimization_threshold="100"
  [couchdb] database_dir="./data"
  [couchdb] default_engine="couch"
  [couchdb] default_security="admin_only"
  [couchdb] file_compression="snappy"
  [couchdb] max_dbs_open="10000"
  [couchdb] max_document_size="8000000"
  [couchdb] os_process_timeout="20000"
  [couchdb] single_node="true"
  [couchdb] users_db_security_editable="false"
  [couchdb] uuid="b17edbd1de7d1504022d6f359ff9a4f8"
  [couchdb] view_index_dir="./data"
  [couchdb_engines] couch="couch_bt_engine"
  [csp] enable="true"
  [fabric] request_timeout="infinity"
  [feature_flags] partitioned||*="true"
  [httpd] allow_jsonp="false"
[httpd] authentication_handlers="{couch_httpd_auth, cookie_authentication_handler}, {couch_httpd_auth, default_authentication_handler}"
  [httpd] bind_address="127.0.0.1"
  [httpd] enable_cors="false"
  [httpd] enable_xframe_options="false"
  [httpd] max_http_request_size="4294967296"
  [httpd] port="5986"
  [httpd] secure_rewrites="true"
  [httpd] socket_options="[{sndbuf, 262144}]"
  [indexers] couch_mrview="true"
  [ioq] concurrency="10"
  [ioq] ratio="0.01"
  [ioq.bypass] compaction="false"
  [ioq.bypass] os_process="true"
  [ioq.bypass] read="true"
  [ioq.bypass] shard_sync="false"
  [ioq.bypass] view_update="true"
  [ioq.bypass] write="true"
  [log] level="debug"
  [log] writer="stderr"
  [query_server_config] os_process_limit="2000"
  [query_server_config] os_process_soft_limit="1000"
  [query_server_config] reduce_limit="true"
  [replicator] connection_timeout="30000"
  [replicator] http_connections="20"
  [replicator] interval="60000"
  [replicator] max_churn="20"
  [replicator] max_jobs="500"
  [replicator] retries_per_request="5"
  [replicator] socket_options="[{keepalive, true}, {nodelay, false}]"
  [replicator] ssl_certificate_max_depth="3"
  [replicator] startup_jitter="5000"
  [replicator] verify_ssl_certificates="false"
  [replicator] worker_batch_size="500"
  [replicator] worker_processes="4"
  [ssl] port="6984"
  [uuids] algorithm="sequential"
  [uuids] max_count="1000"
  [vendor] name="The Apache Software Foundation"

Some errors identified in the startup. They seem not relevant for the case.

 * |_users db| does not exists (but seems to be created afterwards)

[error] 2020-08-17T19:18:47.731011Z nonode@nohost emulator -------- Errorin  
process<0.372.0>  withexit  value:
{database_does_not_exist,[{mem3_shards,load_shards_from_db,"_users",[{file,"src/mem3_shards.erl"},{line,399}]},{mem3_shards,load_shards_from_disk,1,[{file,"src/mem3_shards.erl"},{line,374}]},{mem3_shards,load_shards_from_disk,2,[{file,"src/mem3_shards.erl"},{line,403}]},{mem3_shards,for_docid,3,[{file,"src/mem3_shards.erl"},{line,96}]},{fabric_doc_open,go,3,[{file,"src/fabric_doc_open.erl"},{line,39}]},{chttpd_auth_cache,ensure_auth_ddoc_exists,2,[{file,"src/chttpd_auth_cache.erl"},{line,198}]},{chttpd_auth_cache,listen_for_changes,1,[{file,"src/chttpd_auth_cache.erl"},{line,145}]}]}

 * I suppose without any effect as long as it's configured as single
   node |[couchdb] single_node="true"|

[error] 2020-08-17T19:18:47.799365Z nonode@nohost<0.457.0>  -------- Request to 
create N=3 DB but only 1 node(s)
[error] 2020-08-17T19:18:47.812920Z nonode@nohost<0.457.0>  -------- Request to 
create N=3 DB but only 1 node(s)

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <https://github.com/apache/couchdb/issues/3083>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA3NJDRTNJ7LKG7DWIVQ63SBGPVJANCNFSM4QCIJTTA>.

Reply via email to