notequal1 opened a new issue #3614:
URL: https://github.com/apache/couchdb/issues/3614
[NOTE]: # ( ^^ Provide a general summary of the issue in the title above. ^^
)
## Description
Hi Everyone. I have an interesting issue that I would say is not a mission
critical issue, but is enough of an annoyance that I'd like to see if there is
a way to resolve it. After I create/update a view on a modestly sized database
(~200 MB, ~550,000 documents), and I execute the view, I receive the following
error after several seconds:
`{"error":"os_process_error","reason":"{exit_status,1}","ref":3722911048}`
Interestingly, I can watch the the index for the view being built through
the active tasks in Fauxton, but it still fails after several seconds. If
continue to manually re-execute the view, the same thing continues to occur
until the index is completely built. So, in the end, I do get a working index,
but I have to continually execute the view to get it!
Here is where I think it gets even more head-scratchingly interesting: This
problem occurs on a CouchDB 3.1.1 instance on Ubuntu 20.04, but it does not
occur on CouchDB 3.1.1 instance on Ubuntu 18.04. I set up the same test
database on Ubuntu 18.04 (not replicated from the problem database), used the
same view, and the index built as expected with no error being returned (except
for the timeout, which is expected). The CouchDB instance on 18.04 was
automatically updated from 2.3.1 to 3.0 -> 3.1 -> 3.1.1 using the standard
apt-get mechanism. The CouchDB instance on 20.04 was installed on a fresh OS
install as version 3.1.1 from the beginning. In both cases I have not done any
modifications to the default configuration files, with the exception of adding
security certificates and changing the admin password.
Here is the output of the CouchDB log.
[info] 2021-06-09T04:15:44.376911Z [email protected] <0.6608.1814>
-------- Starting index update for db: shards/00000000-7fffffff/test.1616963133
idx: _design/rpts-prt
[info] 2021-06-09T04:15:44.392423Z [email protected] <0.6608.1814>
-------- Index update finished for db: shards/00000000-7fffffff/test.1616963133
idx: _design/rpts-prt
[notice] 2021-06-09T15:57:28.554997Z [email protected] <0.2721.1841>
dcac6b7359 my_server.my_org.com:6984 my_remote_ip my_couch_user GET
/test/_design/rpts-prt/_view/rpts-prt?descending=true&reduce=false&skip=0&limit=101
200 ok 56
[notice] 2021-06-09T16:43:53.523438Z [email protected] <0.32134.1842>
48a2bac4ae my_server.my_org.com:6984 my_remote_ip my_couch_user GET
/test/_design/rpts-prt/_view/rpts-prt?descending=true&reduce=false&skip=0&limit=101
200 ok 9
[notice] 2021-06-09T16:44:09.288677Z [email protected] <0.32134.1842>
37c1c8778b my_server.my_org.com:6984 my_remote_ip my_couch_user PUT
/test/_design/rpts-prt 201 ok 83
[info] 2021-06-09T16:44:09.346994Z [email protected] <0.24246.1842>
-------- Starting index update for db: shards/00000000-7fffffff/test.1616963133
idx: _design/rpts-prt
[info] 2021-06-09T16:44:09.405678Z [email protected] <0.1988.1843>
-------- Starting index update for db: shards/80000000-ffffffff/test.1616963133
idx: _design/rpts-prt
[notice] 2021-06-09T16:44:34.453822Z [email protected] <0.12722.1842>
8ef8f3138a my_server.my_org.com:6984 my_remote_ip my_couch_user GET
/test/_design/rpts-prt/_view/rpts-prt?descending=true&skip=0&limit=101&reduce=false
500 ok 24997
[info] 2021-06-09T16:45:09.355555Z [email protected] <0.24246.1842>
-------- Starting index update for db: shards/00000000-7fffffff/test.1616963133
idx: _design/rpts-prt
[info] 2021-06-09T16:46:35.794787Z [email protected] <0.24246.1842>
-------- Starting index update for db: shards/00000000-7fffffff/test.1616963133
idx: _design/rpts-prt
[info] 2021-06-09T16:46:35.796524Z [email protected] <0.8870.1843>
-------- Starting index update for db: shards/80000000-ffffffff/test.1616963133
idx: _design/rpts-prt
[notice] 2021-06-09T16:47:00.565525Z [email protected] <0.26500.1842>
fa1ec82354 my_server.my_org.com:6984 my_remote_ip my_couch_user GET
/test/_design/rpts-prt/_view/rpts-prt 500 ok 24773
Here is the output snippit from the syslog:
Jun 9 16:44:34 my_server couchdb[1661763]: out of memory
Jun 9 16:44:34 my_server couchdb[1661762]: out of memory
Jun 9 16:45:33 my_server couchdb[1661764]: out of memory
Jun 9 16:47:00 my_server couchdb[1661016]: out of memory
Jun 9 16:47:00 my_server couchdb[1661017]: out of memory
Here is the view in question:
function (doc) {
// View: rpts-prt
//
// Emit every report entry in the document with any non-numeric
// associated metadata included. The key will consist of the
// platform ID, the report type, and a six element date array which
// includes the time in UTC.
// Split the document ID into the platform ID and date/time
// components.
var _id_splt = doc._id.split("__");
var p = _id_splt[1];
var t = _id_splt[0];
// Use this flag to help determine if the document is valid.
var is_valid = true;
// Make sure there are only two components of the document ID.
if (_id_splt.length != 2) {
is_valid = false;
}
// Determine if the platform ID component does not equal the
// "GBL" or "GRP" value, representing a global or group document,
// respectively. Data should not be saved using either of those
// values, anyway.
if (p == "GBL" || p == "GRP") {
is_valid = false;
}
// Determine that the date component in the "_id" value represents
// an actual date.
var date = null;
if (t.length == 14) {
date = new Date(
t.slice(0, 4) + "-" +
t.slice(4, 6) + "-" +
t.slice(6, 8) + "T" +
t.slice(8, 10) + ":" +
t.slice(10, 12) + ":" +
t.slice(12, 14) + "Z");
}
// Is the new date string a valid data?
if (date.toString() == "Invalid Date") {
is_valid = false;
}
// Loop over the data reports in the document, if the document is
// valid.
if (is_valid === true) {
for (var rpt in doc["data"]) {
// Define the key array.
var key = [
p,
rpt,
t.slice(0, 4),
t.slice(4, 6),
t.slice(6, 8),
t.slice(8, 10),
t.slice(10, 12),
t.slice(12, 14)];
// Emit the key with the report entry.
emit(key, doc["data"][rpt]);
}
}
}
Here is a typical document in the database:
{
"_id": "20210609190000__xyz321",
"_rev": "2-123ad97e06286a7b1003f67f70520ab0",
"data": {
"fm": {
"ta2m": 29.28,
"rh2m": 6.812,
"ws3mx": 6.352,
"ws3ma": 3.518,
"ws3m": 2.916,
"wd3ma": 118.5,
"wd3ms": 19.79,
"wd3m": 114.9,
"rain": 0,
"srad": 1.059,
"insl": 0.3181167,
"tsl10cm": 24.93,
"vbxt": 13.14,
"tpnl": 31.23
},
"hr": {
"ta2mx": {
"value": 29.52,
"date": "20210609182540"
},
"ta2mn": {
"value": 28.7,
"date": "20210609180550"
},
"ta2ma": 29.04,
"rh2mx": {
"value": 11.72,
"date": "20210609182240"
},
"rh2mn": {
"value": 5.177,
"date": "20210609181530"
},
"rh2ma": 8.09,
"ws3ma": 2.741,
"wd3ma": 94.9,
"wd3ms": 24.69,
"sradx": {
"value": 1.063,
"date": "20210609185520"
},
"sradn": {
"value": 1.033,
"date": "20210609180010"
},
"srada": 1.05
}
}
}
## Steps to Reproduce
Fresh install of Ubuntu 20.04 with fresh install of CouchDB 3.1.1. Create
new database. Populate with > ~100,000 documents, Create the view. Execute
the view. I've deleted and created multiple databases to test this and it
displays the same behavior.
## Expected Behaviour
The index begins to be built after making the first request to the view. I
receive a timeout error, but the view index continues to be built.
## Your Environment
* CouchDB version used: 3.1.1
* Browser name and version: curl, Firefox 88
* Operating system and version: Ubuntu 20.04
## Additional Context
Once the database is in production, we expect it to grow to several GB quite
quickly. Our views will mostly be stable, so this shouldn't be a deal breaker
in the future. However, if we do have to modify a view, it would be quite
annoying to have to set up a cron job to make a request to the modified view
every minute just so we can get a valid index built.
## Praise
I'm a sole developer at my organization and was using a MySQL database to
store our mission critical data for several years. The MySQL database was
never replicated on another server (the data was backed up, though) because I
was so busy being the only tech guy (programming, field work, and sys admin)
that I kept pushing that off. I reached a point where the limitations of the
MySQL database was causing me to spend too much time on application development
to get around those issues. I began to seriously look at NoSQL database, and,
after experimenting with MongoDB, I decided that I liked the way CouchDB did
things (also the license). I can now say that I love CouchDB. I had most of
our data transitioned to CouchDB in two days with no-fuss replication to
another server at the same time. The production databases have been reliably
serving data for several years now. It wasn't until we had a hard drive
failure on one of our servers that we started fresh with Ubuntu 20.04 and d
iscovered this problem. However, it's not enough to convince me even a little
bit to leave the CouchDB way of doing things. Hat tip to the developers for
producing a great database system. :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]