schneuwlym opened a new issue #3571:
URL: https://github.com/apache/couchdb/issues/3571


   [NOTE]: # ( ^^ Provide a general summary of the issue in the title above. ^^ 
)
   
   ## Description
   
   [NOTE]: # ( Describe the problem you're encountering. )
   [TIP]:  # ( Do NOT give us access or passwords to your actual CouchDB! )
   
   We have an issue with our CouchDB 3.1.1. We are using the default compaction 
configuration and this seems to work fine till the database reaches a certain 
amount of documents (~76K). Then the compaction dies and it is no longer able 
to finish the task. The compaction is restarted every 2 seconds and it always 
dies immediately. Till now, the problem is consistent and I didn't find any way 
(except of deleting the database) to fix the issue.
   
   I read some other compaction related issues, but here I only used version 
3.1.1. So no upgrade, no migration or something similar.
   
   What I tried so far:
   * I tried to remove the compaction files manually and restart CouchDB. 
Compaction fails again
   * I tried to reboot the node. Compaction fails again
   * Reading the issues #3292 and #2941, I tinkered an own version based on 
3.1.1 including the following two changes. Compaction still fails.
     * fix race condition (#3150)
     * add remonitor code to DOWN message (#3144)
   * First the slack compactor always failed, then I disabled it, but then the 
radio_dbs compactor failed as well.
   
   This is the log, which is repeated every two seconds:
   ```
   [notice] 2021-05-19T14:42:38.848090Z [email protected] <0.460.0> -------- 
ratio_dbs: adding <<"shards/80000000-ffffffff/directory.1621404274">> to 
internal compactor queue with priority 2.100073355455779
   [info] 2021-05-19T14:42:38.848533Z [email protected] <0.5146.0> -------- 
Starting compaction for db "shards/80000000-ffffffff/directory.1621404274" at 
40726
   [notice] 2021-05-19T14:42:38.848615Z [email protected] <0.460.0> -------- 
ratio_dbs: Starting compaction for 
shards/80000000-ffffffff/directory.1621404274 (priority 2.100073355455779)
   [notice] 2021-05-19T14:42:38.849705Z [email protected] <0.460.0> -------- 
ratio_dbs: Started compaction for shards/80000000-ffffffff/directory.1621404274
   [warning] 2021-05-19T14:42:38.893633Z [email protected] <0.460.0> -------- 
exit for compaction of ["shards/80000000-ffffffff/directory.1621404274"]: 
{undef,[{math,ceil,[1.6],[]},{couch_emsort,num_merges,2,[{file,"src/couch_emsort.erl"},{line,366}]},{couch_bt_engine_compactor,sort_meta_data,1,[{file,"src/couch_bt_engine_compactor.erl"},{line,508}]},{lists,foldl,3,[{file,"lists.erl"},{line,1263}]},{couch_bt_engine_compactor,start,4,[{file,"src/couch_bt_engine_compactor.erl"},{line,75}]}]}
   [error] 2021-05-19T14:42:38.894691Z [email protected] emulator -------- 
Error in process <0.5148.0> on node '[email protected]' with exit value:
   
{undef,[{math,ceil,[1.6],[]},{couch_emsort,num_merges,2,[{file,"src/couch_emsort.erl"},{line,366}]},{couch_bt_engine_compactor,sort_meta_data,1,[{file,"src/couch_bt_engine_compactor.erl"},{line,508}]},{lists,foldl,3,[{file,"lists.erl"},{line,1263}]},{couch_bt_engine_compactor,start,4,[{file,"src/couch_bt_engine_compactor.erl"},{line,75}]}]}
   
   [info] 2021-05-19T14:42:38.894453Z [email protected] <0.226.0> -------- db 
shards/80000000-ffffffff/directory.1621404274 died with reason 
{undef,[{math,ceil,[1.6],[]},{couch_emsort,num_merges,2,[{file,"src/couch_emsort.erl"},{line,366}]},{couch_bt_engine_compactor,sort_meta_data,1,[{file,"src/couch_bt_engine_compactor.erl"},{line,508}]},{lists,foldl,3,[{file,"lists.erl"},{line,1263}]},{couch_bt_engine_compactor,start,4,[{file,"src/couch_bt_engine_compactor.erl"},{line,75}]}]}
   
   ```
   
   If the problem occurs, inserting data is still possible, but often I get the 
following error message (btw, I'm using python-cloudant)
   ```
   500 Server Error: Internal Server Error unknown_error undefined for url: 
http://localhost:5984/directory
   ```
   
   ## Steps to Reproduce
   
   [NOTE]: # ( Include commands to reproduce, if possible. curl is preferred. )
   
   1. Clean database
   2. Create a script, which creates documents in an endless loop (pur json, no 
attachments, just one revision)
   3. After around 76K documents the compactor starts to fail.
   4. Inserts are still possible, but time and again, the insert fails with 
(see above 500 Server Error)
   
   I did the mentioned stress test above on 3 nodes in parallel. All 3 nodes 
started to fail around the same amount of documents (70K-80K).
   * In the first node, I created the documents single threaded
   * In the second node, I created the documents using two threads
   * In the third node, I created the documents using four threads
   
   Following the script I used to reproduce the issue in my setup:
   ```
   #!/usr/bin/env python
   
   import signal
   import sys
   from cloudant.client import CouchDB
   from cloudant.document import Document
   from copy import deepcopy
   from threading import Thread
   
   
   USERNAME = 'admin'
   PASSWORD = 'admin'
   COUCHDB_URL = 'http://localhost:5984'
   DB_NAME = 'directory'
   
   
   cdb = CouchDB(USERNAME, PASSWORD, url=COUCHDB_URL, connect=True, 
auto_renew=True)
   
   account_skeletton = { 'parameter 1': 0,
                         'parameter 2': True,
                         'parameter 3': '',
                         'parameter 4': '',
                         'parameter 5': [],
                         'parameter 6': [],
                         'description': '',
                         'enabled': True,
                         'firstname': '',
                         'parameter 7': False,
                         'lastname': '',
                         'parameter 8': '',
                         'number': '',
                         'parameter 9': 
'9301162291d5a0480270d97d6c4a6da3edd75aa5',
                         'parameter 10': 'cos02',
                         'parameter 11': '112233',
                         'parameter 12': 1620118266.572422,
                         'parameter 13': 0,
                         'parameter 14': 0.0,
                         'parameter 15': False,
                         'parameter 16': 4,
                         'parameter 17': '',
                         'parameter 18': '',
                         'parameter 19': 'user',
                         'userid': '',
                         'parameter 20': '',
                         'parameter 21': '',
                         'parameter 22': True}
   
   
   if DB_NAME not in cdb.all_dbs():
       cdb.create_database(DB_NAME)
   
   
   def signal_handler(sig, frame):
       print('You pressed Ctrl+C!')
       sys.exit(0)
   
   
   def create_documents(start=0, thread_id=0):
       try:
           for i in xrange(start, 999999):
               number = '{}{:06}'.format(thread_id, i)
               print('create_documents: Creating document {}'.format(number))
               with Document(cdb[DB_NAME], number) as document:
                   document.update(deepcopy(account_skeletton))
                   document['firstname'] = 'FN {}'.format(number)
                   document['lastname'] = 'LN {}'.format(number)
                   document['number'] = number
                   document['userid'] = number
       except Exception as err:
           print('create_documents: {}'.format(err))
   
   
   def create_documents_threaded(threads=2):
       for i in xrange(threads):
           t = Thread(target=create_documents, args=(0, i))
           t.daemon = True
           t.start()
       
       signal.signal(signal.SIGINT, signal_handler)
       print('Press Ctrl+C')
       signal.pause()
   ```
   
   ## Expected Behaviour
   
   [NOTE]: # ( Tell us what you expected to happen. )
   
   Compaction doesn't fail :-)
   
   ## Your Environment
   
   [TIP]:  # ( Include as many relevant details about your environment as 
possible. )
   [TIP]:  # ( You can paste the output of curl http://YOUR-COUCHDB:5984/ here. 
)
   
   * CouchDB version used:
     
`{"couchdb":"Welcome","version":"3.1.1","git_sha":"ce596c65d","uuid":"08fb7cd0a10f35f6215a531742f7b356","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The
 Apache Software Foundation"}}`
   * python-cloudant: 2.14.0
   * python2.7
   * Operating system and version:
     * Own Linux distribution
   * CouchDB running in a VM
     * Single Core (also changed to 2 cores, no difference)
     * 1GB Ram (also increased it to 1GB, no difference)
   * To trigger this issue, I used an isolated node, no replication, no 
clustering
   
   ## Additional Context
   
   [TIP]:  # ( Add any other context about the problem here. )
   
   Following you can find the configuration. Most of it is default:
   ```
   curl http://admin:admin@localhost:5984/_node/[email protected]/_config | 
python -m json.tool
     % Total    % Received % Xferd  Average Speed   Time    Time     Time  
Current
                                    Dload  Upload   Total   Spent    Left  Speed
   100  2823  100  2823    0     0   310k      0 --:--:-- --:--:-- --:--:--  
344k
   {
       "admins": {
           "admin": 
"-pbkdf2-d5b128e39ebe61b4f50fb9c2e3241c0ea1bc28f9,6b6e6d21c67f685f753d8fa1fe72db71,10"
       },
       "attachments": {
           "compressible_types": "text/*, application/javascript, 
application/json, application/xml",
           "compression_level": "8"
       },
       "chttpd": {
           "backlog": "512",
           "bind_address": "0.0.0.0",
           "max_db_number_for_dbs_info_req": "100",
           "port": "5984",
           "prefer_minimal": "Cache-Control, Content-Length, Content-Range, 
Content-Type, ETag, Server, Transfer-Encoding, Vary",
           "require_valid_user": "false",
           "server_options": "[{recbuf, undefined}]",
           "socket_options": "[{sndbuf, 262144}, {nodelay, true}]"
       },
       "cluster": {
           "n": "3",
           "q": "2"
       },
       "cors": {
           "credentials": "false"
       },
       "couch_httpd_auth": {
           "allow_persistent_cookies": "true",
           "auth_cache_size": "50",
           "authentication_db": "_users",
           "authentication_redirect": "/_utils/session.html",
           "iterations": "10",
           "require_valid_user": "false",
           "secret": "a0ec90afc5f896e3cf90e8c4adc9dafa",
           "timeout": "600"
       },
       "couch_peruser": {
           "database_prefix": "userdb-",
           "delete_dbs": "false",
           "enable": "false"
       },
       "couchdb": {
           "attachment_stream_buffer_size": "4096",
           "changes_doc_ids_optimization_threshold": "100",
           "database_dir": "/var/crypt/couchdb/couchdb",
           "default_engine": "couch",
           "default_security": "everyone",
           "file_compression": "snappy",
           "max_dbs_open": "500",
           "max_document_size": "8000000",
           "os_process_timeout": "5000",
           "single_node": "true",
           "users_db_security_editable": "false",
           "uuid": "08fb7cd0a10f35f6215a531742f7b356",
           "view_index_dir": "/var/crypt/couchdb/couchdb"
       },
       "couchdb_engines": {
           "couch": "couch_bt_engine"
       },
       "csp": {
           "enable": "true"
       },
       "feature_flags": {
           "partitioned||*": "true"
       },
       "httpd": {
           "allow_jsonp": "false",
           "authentication_handlers": "{couch_httpd_auth, 
cookie_authentication_handler}, {couch_httpd_auth, 
default_authentication_handler}",
           "bind_address": "127.0.0.1",
           "enable_cors": "false",
           "enable_xframe_options": "false",
           "max_http_request_size": "4294967296",
           "port": "5986",
           "secure_rewrites": "true",
           "socket_options": "[{sndbuf, 262144}]"
       },
       "indexers": {
           "couch_mrview": "true"
       },
       "ioq": {
           "concurrency": "10",
           "ratio": "0.01"
       },
       "ioq.bypass": {
           "compaction": "false",
           "os_process": "true",
           "read": "true",
           "shard_sync": "false",
           "view_update": "true",
           "write": "true"
       },
       "log": {
           "file": "/var/log/couchdb/couchdb.log",
           "level": "info",
           "writer": "file"
       },
       "query_server_config": {
           "os_process_limit": "100",
           "reduce_limit": "true"
       },
       "replicator": {
           "connection_timeout": "30000",
           "http_connections": "20",
           "interval": "60000",
           "max_churn": "20",
           "max_jobs": "500",
           "retries_per_request": "5",
           "socket_options": "[{keepalive, true}, {nodelay, false}]",
           "ssl_certificate_max_depth": "3",
           "startup_jitter": "5000",
           "verify_ssl_certificates": "true",
           "worker_batch_size": "500",
           "worker_processes": "4"
       },
       "smoosh": {
           "db_channels": "upgrade_dbs,ratio_dbs",
           "view_channels": "upgrade_views,ratio_views"
       },
       "ssl": {
           "port": "6984"
       },
       "uuids": {
           "algorithm": "sequential",
           "max_count": "1000"
       },
       "vendor": {
           "name": "The Apache Software Foundation"
       }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to