[GitHub] davisp opened a new pull request #1596: Fix couch server race condition
davisp opened a new pull request #1596: Fix couch server race condition URL: https://github.com/apache/couchdb/pull/1596 ## Overview There are two separate bugs here being fixed. The first was in `couch_server:terminate/2`. If there was an active open_async process when couch_server died it would throw a function clause when it tried to shutdown the database that wasn't yet open. We just filter out any `#entry{}` record to avoid the issue. This a simple bug but it prevented couch_server from properly generating a SASL report that ended up covering up the second more complex race condition. The second bug has to deal with a race between deletions and opens as well as the state of couch_server's mailbox. There's a very specific set of operations that have to happen in a particular order to trigger this bug: 1. An in-flight open or creation request that will ultimately succeed. The `'$gen_call'` message must be in couch_server's message queue before Step 2 finishes. 2. A deletion request 3. A second open or creation request that is enqueued before couch_server processes the `'$gen_call'` message from Step 1. 4. The couch_db_updater pid's `'EXIT'` signal is delayed until after the open_result from Step 3 is delivered to couch_server The underlying issue here is that the deletion request clears the `couch_dbs` ets table state without removing the possible corresponding state in couch_server's message queue. As things progress couch_server ends up mistaking the `open_result` message from Step 1 as corresponding to the open_async process spawned in Step 3. Currently this results in the client from Step 3 getting an invalid response from couch_server, and then more importantly, couch_server dies while attempting to process the real `open_result` message because of the state in the `couch_dbs` ets table being incorrect (it would throw a function_clause error in a list comprehension because `#entry.waiters` was undefined). The fix was just to ensure that the opener pid in the `#entry{}` record matches the pid of the caller. Anything that doesn't match is ignored since the deletion already cleaned up the server state. Although we do kill the couch_db_updater pid for extra security, Though technically its duplicating work that the deletion request already handled (via killing the `open_async` pid which is linked to it). ## Testing recommendations The second of three commits adds a failing test case that is fixed by the third commit. `make eunit` ## Checklist - [x] Code is written and works correctly; - [x] Changes are covered by tests; - [x] No documentation to change This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nickva commented on a change in pull request #1596: Fix couch server race condition
nickva commented on a change in pull request #1596: Fix couch server race condition URL: https://github.com/apache/couchdb/pull/1596#discussion_r215695799 ## File path: src/couch/src/couch_server.erl ## @@ -421,27 +425,37 @@ handle_call({open_result, T0, DbName, {ok, Db}}, {FromPid, _Tag}, Server) -> true -> Server#server.lru end, -{reply, ok, Server#server{lru = Lru}} +{reply, ok, Server#server{lru = Lru}}; +[#entry{}] -> +% A mismatched opener pid means that this open_result message +% was in our mailbox but is now stale. Mostly ignore +% it except to ensure that the db pid is super dead. +exit(couch_db:get_pid(Db), kill), Review comment: Will Db ever be undefined here, then maybe a catch around it? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] davisp commented on a change in pull request #1596: Fix couch server race condition
davisp commented on a change in pull request #1596: Fix couch server race condition URL: https://github.com/apache/couchdb/pull/1596#discussion_r215696991 ## File path: src/couch/src/couch_server.erl ## @@ -421,27 +425,37 @@ handle_call({open_result, T0, DbName, {ok, Db}}, {FromPid, _Tag}, Server) -> true -> Server#server.lru end, -{reply, ok, Server#server{lru = Lru}} +{reply, ok, Server#server{lru = Lru}}; +[#entry{}] -> +% A mismatched opener pid means that this open_result message +% was in our mailbox but is now stale. Mostly ignore +% it except to ensure that the db pid is super dead. +exit(couch_db:get_pid(Db), kill), Review comment: If it does then we have bigger problems as this is an {ok, Db} result we're handling specifically. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nickva commented on a change in pull request #1596: Fix couch server race condition
nickva commented on a change in pull request #1596: Fix couch server race condition URL: https://github.com/apache/couchdb/pull/1596#discussion_r215697732 ## File path: src/couch/src/couch_server.erl ## @@ -421,27 +425,37 @@ handle_call({open_result, T0, DbName, {ok, Db}}, {FromPid, _Tag}, Server) -> true -> Server#server.lru end, -{reply, ok, Server#server{lru = Lru}} +{reply, ok, Server#server{lru = Lru}}; +[#entry{}] -> +% A mismatched opener pid means that this open_result message +% was in our mailbox but is now stale. Mostly ignore +% it except to ensure that the db pid is super dead. +exit(couch_db:get_pid(Db), kill), Review comment: Oh, derp, we are getting it from an opener, not grabbing from the ets table This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nickva commented on issue #1596: Fix couch server race condition
nickva commented on issue #1596: Fix couch server race condition URL: https://github.com/apache/couchdb/pull/1596#issuecomment-419169919 The issue is reproducible with with this script (in a remsh): ``` f(FuzzIt), FuzzIt = fun(DbName, Tries, WaitMax) -> Ref = erlang:monitor(process, whereis(couch_server)), [begin [begin CallRefs = [Open1Ref, DeleteRef, Open2Ref] = [make_ref(), make_ref(), make_ref()], whereis(couch_server) ! {'$gen_call', {self(), Open1Ref}, {create, DbName, []}}, timer:sleep(W), whereis(couch_server) ! {'$gen_call', {self(), DeleteRef}, {delete, DbName, []}}, whereis(couch_server) ! {'$gen_call', {self(), Open2Ref}, {open, DbName, []}}, [receive {R, _} -> ok end || R <- CallRefs] end || W <- lists:seq(0, WaitMax)] end || _ <- lists:seq(1, Tries)], receive {'DOWN', Ref, _, _, Reason} -> {bam, Reason} after 5000 -> nope end end. ``` When run like this for example: ``` FuzzIt(<<"blah">>, 100, 15). ``` It might help to have a few dbs open and some activity, like say a bunch of replication jobs running as well. The node would crash and possibly restart leaving something like this in the log: ``` [error] 2018-09-06T17:01:48.085468Z node1@127.0.0.1 <0.243.0> couch_server terminating with {function_clause,[{couch_server,'-handle_call/3-lc$^1/1-1-',[undefined],[{file,"src/couch_server.erl"},{line,434}]},{couch_server,handle_call,3,[{file,"src/couch_server.erl"}, ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nickva edited a comment on issue #1596: Fix couch server race condition
nickva edited a comment on issue #1596: Fix couch server race condition URL: https://github.com/apache/couchdb/pull/1596#issuecomment-419169919 The issue is reproducible with this script (in a remsh): ``` f(FuzzIt), FuzzIt = fun(DbName, Tries, WaitMax) -> Ref = erlang:monitor(process, whereis(couch_server)), [begin [begin CallRefs = [Open1Ref, DeleteRef, Open2Ref] = [make_ref(), make_ref(), make_ref()], whereis(couch_server) ! {'$gen_call', {self(), Open1Ref}, {create, DbName, []}}, timer:sleep(W), whereis(couch_server) ! {'$gen_call', {self(), DeleteRef}, {delete, DbName, []}}, whereis(couch_server) ! {'$gen_call', {self(), Open2Ref}, {open, DbName, []}}, [receive {R, _} -> ok end || R <- CallRefs] end || W <- lists:seq(0, WaitMax)] end || _ <- lists:seq(1, Tries)], receive {'DOWN', Ref, _, _, Reason} -> {bam, Reason} after 5000 -> nope end end. ``` When run like this for example: ``` FuzzIt(<<"blah">>, 100, 15). ``` It might help to have a few dbs open and some activity, like say a bunch of replication jobs running as well. The node would crash and possibly restart leaving something like this in the log: ``` [error] 2018-09-06T17:01:48.085468Z node1@127.0.0.1 <0.243.0> couch_server terminating with {function_clause,[{couch_server,'-handle_call/3-lc$^1/1-1-',[undefined],[{file,"src/couch_server.erl"},{line,434}]},{couch_server,handle_call,3,[{file,"src/couch_server.erl"}, ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] davisp commented on issue #1596: Fix couch server race condition
davisp commented on issue #1596: Fix couch server race condition URL: https://github.com/apache/couchdb/pull/1596#issuecomment-419209179 Ah, @iilyak confirmed. Taking that as all good. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] davisp commented on issue #1596: Fix couch server race condition
davisp commented on issue #1596: Fix couch server race condition URL: https://github.com/apache/couchdb/pull/1596#issuecomment-419209060 @nickva And to be clear, you didn't manage to reproduce it with that script after this fix right? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] davisp closed pull request #1596: Fix couch server race condition
davisp closed pull request #1596: Fix couch server race condition URL: https://github.com/apache/couchdb/pull/1596 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] code-noggin opened a new issue #104: Unable to use 2.2.0 as base image
code-noggin opened a new issue #104: Unable to use 2.2.0 as base image URL: https://github.com/apache/couchdb-docker/issues/104 ## Expected Behavior I'm trying to use 2.2.0 as a base image for my own docker image. I should be able to run `apt-get update` in my Dockerfile without an error. ``` FROM couchdb:2.2.0 RUN apt-get update ``` ## Current Behavior The `apt-get update` fails when fetching `https://apache.bintray.com/couchdb-deb/dists/stretch/InRelease` because the `apt-transport-https` package was purged at https://github.com/apache/couchdb-docker/blob/f429c1ccf22fe8cf7717383462fbf2f56e6d0301/2.2.0/Dockerfile#L138. Here's the build output: ``` docker build . Sending build context to Docker daemon 146.4kB Step 1/16 : FROM couchdb:2.2.0 ---> 44b2522e3e4a Step 2/16 : RUN apt-get update ---> Running in e6de50b54b92 Get:1 http://security.debian.org/debian-security stretch/updates InRelease [94.3 kB] Get:2 http://security.debian.org/debian-security stretch/updates/main amd64 Packages [393 kB] Ign:3 http://cdn-fastly.deb.debian.org/debian stretch InRelease Get:4 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease [91.0 kB] Get:5 http://cdn-fastly.deb.debian.org/debian stretch Release [118 kB] Get:6 http://cdn-fastly.deb.debian.org/debian stretch Release.gpg [2434 B] Get:7 http://cdn-fastly.deb.debian.org/debian stretch-updates/main amd64 Packages [5148 B] Get:8 http://cdn-fastly.deb.debian.org/debian stretch/main amd64 Packages [7099 kB] Fetched 7802 kB in 2s (2943 kB/s) Reading package lists... E: The method driver /usr/lib/apt/methods/https could not be found. E: Failed to fetch https://apache.bintray.com/couchdb-deb/dists/stretch/InRelease E: Some index files failed to download. They have been ignored, or old ones used instead. The command '/bin/sh -c apt-get update' returned a non-zero code: 100 ``` ## Possible Solution Do not include `apt-transport-https` as one of the `buildDeps` at https://github.com/apache/couchdb-docker/blob/f429c1ccf22fe8cf7717383462fbf2f56e6d0301/2.2.0/Dockerfile#L114. ## Steps to Reproduce (for bugs) ``` #!/bin/sh cat <<-END-OF-DOCKERFILE > Dockerfile.couchdb_extension FROM couchdb:2.2.0 RUN apt-get update END-OF-DOCKERFILE docker build -f Dockerfile.couchdb_extension . ``` ## Context I'm trying to create a custom docker image based on the official couchdb:2.2.0 docker image. My image will have custom couchdb config and run an additional application in the background. ## Your Environment * Version used: `Docker version 18.06.1-ce, build e68fc7a` * Browser Name and version: n/a * Operating System and version (desktop or mobile): macOS High Sierra * Link to your project: n/a This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nickva opened a new pull request #1597: Allow disabling off-heap messages
nickva opened a new pull request #1597: Allow disabling off-heap messages URL: https://github.com/apache/couchdb/pull/1597 ### Description Off-heap messages is an Erlang 19 feature: http://erlang.org/doc/man/erlang.html#process_flag_message_queue_data It is adviseable to use that setting for processes which expect to receive a lot of messages. CouchDB sets it for couch_server, couch_log_server and bunch of others as well. In some cases the off-heap behavior could alter the timing of message receives and expose subtle bugs that have been lurking in the code for years. Or could slightly reduce performance, so a safety measure allow disabling it. ### How to test ``` > config:set("couchdb","enable_off_heap_messages", "false"). > Pid = whereis(couch_server). <0.311.0> > exit(Pid, kill). > erlang:process_info(whereis(couch_server), message_queue_data). {message_queue_data,on_heap} ``` ### Related https://github.com/apache/couchdb/pull/1392 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nickva commented on issue #1597: Allow disabling off-heap messages
nickva commented on issue #1597: Allow disabling off-heap messages URL: https://github.com/apache/couchdb/pull/1597#issuecomment-419253500 Good idea, @davisp This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] nickva closed pull request #1597: Allow disabling off-heap messages
nickva closed pull request #1597: Allow disabling off-heap messages URL: https://github.com/apache/couchdb/pull/1597 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] davisp commented on issue #1593: Couch server improvements
davisp commented on issue #1593: Couch server improvements URL: https://github.com/apache/couchdb/pull/1593#issuecomment-419262317 Another thing we should consider is sending the Timeout value in the open messages so that couch_server can discard any requests from a client that's already wandered off. https://github.com/apache/couchdb/blob/4afed4e386f3f8f1908ba989f49aaff506177be4/src/couch/src/couch_server.erl#L89 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] davisp commented on issue #1593: Couch server improvements
davisp commented on issue #1593: Couch server improvements URL: https://github.com/apache/couchdb/pull/1593#issuecomment-419262745 And by timeout, I mean we should send an expiry in the message. So something like "now + Timestamp" and then couch_server can throw it away if that's in the past. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wohali closed issue #104: Unable to use 2.2.0 as base image
wohali closed issue #104: Unable to use 2.2.0 as base image URL: https://github.com/apache/couchdb-docker/issues/104 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wohali commented on issue #104: Unable to use 2.2.0 as base image
wohali commented on issue #104: Unable to use 2.2.0 as base image URL: https://github.com/apache/couchdb-docker/issues/104#issuecomment-419303066 Hi there, duplicate of #101. Our very next version of this will correct the problem. If you need to have a workaround sooner, you can build the couchdb image using the files from this repository yourself. Sorry for the inconvenience. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sklassen commented on issue #27: vm.args is now editable -- enabling clusters; snap set works for name and cookie
sklassen commented on issue #27: vm.args is now editable -- enabling clusters; snap set works for name and cookie URL: https://github.com/apache/couchdb-pkg/pull/27#issuecomment-419307987 Hi @wohali Two questions. Are there code names for database formats? For example the 1.7x file format is different from the 2.2; but is 2.0 and 2.2 file format different? I've created a sub directory in the common folder so that the snap can host multiple database formats. I also want to test this snap from the Ubuntu store. Should I do this under my own login? Thanks Simon This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services