[GitHub] davisp opened a new pull request #1596: Fix couch server race condition

2018-09-06 Thread GitBox
davisp opened a new pull request #1596: Fix couch server race condition
URL: https://github.com/apache/couchdb/pull/1596
 
 
   ## Overview
   
   There are two separate bugs here being fixed. The first was in 
`couch_server:terminate/2`. If there was an active open_async process when 
couch_server died it would throw a function clause when it tried to shutdown 
the database that wasn't yet open. We just filter out any `#entry{}` record to 
avoid the issue. This a simple bug but it prevented couch_server from properly 
generating a SASL report that ended up covering up the second more complex race 
condition.
   
   The second bug has to deal with a race between deletions and opens as well 
as the state of couch_server's mailbox. There's a very specific set of 
operations that have to happen in a particular order to trigger this bug:
   
   1. An in-flight open or creation request that will ultimately succeed. The 
`'$gen_call'` message must be in couch_server's message queue before Step 2 
finishes.
   2. A deletion request
   3. A second open or creation request that is enqueued before couch_server 
processes the `'$gen_call'` message from Step 1.
   4. The couch_db_updater pid's `'EXIT'` signal is delayed until after the 
open_result from Step 3 is delivered to couch_server
   
   The underlying issue here is that the deletion request clears the 
`couch_dbs` ets table state without removing the possible corresponding state 
in couch_server's message queue. As things progress couch_server ends up 
mistaking the `open_result` message from Step 1 as corresponding to the 
open_async process spawned in Step 3. Currently this results in the client from 
Step 3 getting an invalid response from couch_server, and then more 
importantly, couch_server dies while attempting to process the real 
`open_result` message because of the state in the `couch_dbs` ets table being 
incorrect (it would throw a function_clause error in a list comprehension 
because `#entry.waiters` was undefined).
   
   The fix was just to ensure that the opener pid in the `#entry{}` record 
matches the pid of the caller. Anything that doesn't match is ignored since the 
deletion already cleaned up the server state. Although we do kill the 
couch_db_updater pid for extra security, Though technically its duplicating 
work that the deletion request already handled (via killing the `open_async` 
pid which is linked to it).
   
   ## Testing recommendations
   
   The second of three commits adds a failing test case that is fixed by the 
third commit.
   
   `make eunit`
   
   ## Checklist
   
   - [x] Code is written and works correctly;
   - [x] Changes are covered by tests;
   - [x] No documentation to change
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nickva commented on a change in pull request #1596: Fix couch server race condition

2018-09-06 Thread GitBox
nickva commented on a change in pull request #1596: Fix couch server race 
condition
URL: https://github.com/apache/couchdb/pull/1596#discussion_r215695799
 
 

 ##
 File path: src/couch/src/couch_server.erl
 ##
 @@ -421,27 +425,37 @@ handle_call({open_result, T0, DbName, {ok, Db}}, 
{FromPid, _Tag}, Server) ->
 true ->
 Server#server.lru
 end,
-{reply, ok, Server#server{lru = Lru}}
+{reply, ok, Server#server{lru = Lru}};
+[#entry{}] ->
+% A mismatched opener pid means that this open_result message
+% was in our mailbox but is now stale. Mostly ignore
+% it except to ensure that the db pid is super dead.
+exit(couch_db:get_pid(Db), kill),
 
 Review comment:
   Will Db ever be undefined here, then maybe a catch around it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] davisp commented on a change in pull request #1596: Fix couch server race condition

2018-09-06 Thread GitBox
davisp commented on a change in pull request #1596: Fix couch server race 
condition
URL: https://github.com/apache/couchdb/pull/1596#discussion_r215696991
 
 

 ##
 File path: src/couch/src/couch_server.erl
 ##
 @@ -421,27 +425,37 @@ handle_call({open_result, T0, DbName, {ok, Db}}, 
{FromPid, _Tag}, Server) ->
 true ->
 Server#server.lru
 end,
-{reply, ok, Server#server{lru = Lru}}
+{reply, ok, Server#server{lru = Lru}};
+[#entry{}] ->
+% A mismatched opener pid means that this open_result message
+% was in our mailbox but is now stale. Mostly ignore
+% it except to ensure that the db pid is super dead.
+exit(couch_db:get_pid(Db), kill),
 
 Review comment:
   If it does then we have bigger problems as this is an {ok, Db} result we're 
handling specifically.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nickva commented on a change in pull request #1596: Fix couch server race condition

2018-09-06 Thread GitBox
nickva commented on a change in pull request #1596: Fix couch server race 
condition
URL: https://github.com/apache/couchdb/pull/1596#discussion_r215697732
 
 

 ##
 File path: src/couch/src/couch_server.erl
 ##
 @@ -421,27 +425,37 @@ handle_call({open_result, T0, DbName, {ok, Db}}, 
{FromPid, _Tag}, Server) ->
 true ->
 Server#server.lru
 end,
-{reply, ok, Server#server{lru = Lru}}
+{reply, ok, Server#server{lru = Lru}};
+[#entry{}] ->
+% A mismatched opener pid means that this open_result message
+% was in our mailbox but is now stale. Mostly ignore
+% it except to ensure that the db pid is super dead.
+exit(couch_db:get_pid(Db), kill),
 
 Review comment:
   Oh, derp, we are getting it from an opener, not grabbing from the ets table 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nickva commented on issue #1596: Fix couch server race condition

2018-09-06 Thread GitBox
nickva commented on issue #1596: Fix couch server race condition
URL: https://github.com/apache/couchdb/pull/1596#issuecomment-419169919
 
 
   The issue is reproducible with with this script (in a remsh):
   
   ```
   f(FuzzIt), FuzzIt = fun(DbName, Tries, WaitMax) ->
   Ref = erlang:monitor(process, whereis(couch_server)),
   [begin
[begin
   CallRefs = [Open1Ref, DeleteRef, Open2Ref] = [make_ref(), 
make_ref(), make_ref()],
   whereis(couch_server) ! {'$gen_call', {self(), Open1Ref}, 
{create, DbName, []}},
   timer:sleep(W),
   whereis(couch_server) ! {'$gen_call', {self(), DeleteRef}, 
{delete, DbName, []}},
   whereis(couch_server) ! {'$gen_call', {self(), Open2Ref}, 
{open, DbName, []}},
   [receive {R, _} -> ok end || R <- CallRefs]
end || W <- lists:seq(0, WaitMax)]
end || _ <- lists:seq(1, Tries)],
   receive  {'DOWN', Ref, _, _, Reason} ->  {bam, Reason}
   after 5000 -> nope
   end
   end.
   ```
   When run like this for example:
   ```
   FuzzIt(<<"blah">>, 100, 15).
   ```  
   
   It might help to have a few dbs open and some activity, like say a bunch of 
replication jobs running as well.
   
   The node would crash and possibly restart leaving something like this in the 
log:
   ```
   [error] 2018-09-06T17:01:48.085468Z node1@127.0.0.1 <0.243.0>  
couch_server terminating with 
{function_clause,[{couch_server,'-handle_call/3-lc$^1/1-1-',[undefined],[{file,"src/couch_server.erl"},{line,434}]},{couch_server,handle_call,3,[{file,"src/couch_server.erl"},
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nickva edited a comment on issue #1596: Fix couch server race condition

2018-09-06 Thread GitBox
nickva edited a comment on issue #1596: Fix couch server race condition
URL: https://github.com/apache/couchdb/pull/1596#issuecomment-419169919
 
 
   The issue is reproducible with this script (in a remsh):
   
   ```
   f(FuzzIt), FuzzIt = fun(DbName, Tries, WaitMax) ->
   Ref = erlang:monitor(process, whereis(couch_server)),
   [begin
[begin
   CallRefs = [Open1Ref, DeleteRef, Open2Ref] = [make_ref(), 
make_ref(), make_ref()],
   whereis(couch_server) ! {'$gen_call', {self(), Open1Ref}, 
{create, DbName, []}},
   timer:sleep(W),
   whereis(couch_server) ! {'$gen_call', {self(), DeleteRef}, 
{delete, DbName, []}},
   whereis(couch_server) ! {'$gen_call', {self(), Open2Ref}, 
{open, DbName, []}},
   [receive {R, _} -> ok end || R <- CallRefs]
end || W <- lists:seq(0, WaitMax)]
end || _ <- lists:seq(1, Tries)],
   receive  {'DOWN', Ref, _, _, Reason} ->  {bam, Reason}
   after 5000 -> nope
   end
   end.
   ```
   When run like this for example:
   ```
   FuzzIt(<<"blah">>, 100, 15).
   ```  
   
   It might help to have a few dbs open and some activity, like say a bunch of 
replication jobs running as well.
   
   The node would crash and possibly restart leaving something like this in the 
log:
   ```
   [error] 2018-09-06T17:01:48.085468Z node1@127.0.0.1 <0.243.0>  
couch_server terminating with 
{function_clause,[{couch_server,'-handle_call/3-lc$^1/1-1-',[undefined],[{file,"src/couch_server.erl"},{line,434}]},{couch_server,handle_call,3,[{file,"src/couch_server.erl"},
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] davisp commented on issue #1596: Fix couch server race condition

2018-09-06 Thread GitBox
davisp commented on issue #1596: Fix couch server race condition
URL: https://github.com/apache/couchdb/pull/1596#issuecomment-419209179
 
 
   Ah, @iilyak confirmed. Taking that as all good.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] davisp commented on issue #1596: Fix couch server race condition

2018-09-06 Thread GitBox
davisp commented on issue #1596: Fix couch server race condition
URL: https://github.com/apache/couchdb/pull/1596#issuecomment-419209060
 
 
   @nickva And to be clear, you didn't manage to reproduce it with that script 
after this fix right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] davisp closed pull request #1596: Fix couch server race condition

2018-09-06 Thread GitBox
davisp closed pull request #1596: Fix couch server race condition
URL: https://github.com/apache/couchdb/pull/1596
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] code-noggin opened a new issue #104: Unable to use 2.2.0 as base image

2018-09-06 Thread GitBox
code-noggin opened a new issue #104: Unable to use 2.2.0 as base image
URL: https://github.com/apache/couchdb-docker/issues/104
 
 
   
   
   ## Expected Behavior
   
   
   I'm trying to use 2.2.0 as a base image for my own docker image. I should be 
able to run `apt-get update` in my Dockerfile without an error.
   ```
   FROM couchdb:2.2.0
   RUN apt-get update 
   ```
   
   ## Current Behavior
   
   
   
   The `apt-get update` fails when fetching 
`https://apache.bintray.com/couchdb-deb/dists/stretch/InRelease` because the 
`apt-transport-https` package was purged at 
https://github.com/apache/couchdb-docker/blob/f429c1ccf22fe8cf7717383462fbf2f56e6d0301/2.2.0/Dockerfile#L138.
 
   
   Here's the build output:
   ```
   docker build  .
   Sending build context to Docker daemon  146.4kB
   Step 1/16 : FROM couchdb:2.2.0
---> 44b2522e3e4a
   Step 2/16 : RUN apt-get update
---> Running in e6de50b54b92
   Get:1 http://security.debian.org/debian-security stretch/updates InRelease 
[94.3 kB]
   Get:2 http://security.debian.org/debian-security stretch/updates/main amd64 
Packages [393 kB]
   Ign:3 http://cdn-fastly.deb.debian.org/debian stretch InRelease
   Get:4 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease 
[91.0 kB]
   Get:5 http://cdn-fastly.deb.debian.org/debian stretch Release [118 kB]
   Get:6 http://cdn-fastly.deb.debian.org/debian stretch Release.gpg [2434 B]
   Get:7 http://cdn-fastly.deb.debian.org/debian stretch-updates/main amd64 
Packages [5148 B]
   Get:8 http://cdn-fastly.deb.debian.org/debian stretch/main amd64 Packages 
[7099 kB]
   Fetched 7802 kB in 2s (2943 kB/s)
   Reading package lists...
   E: The method driver /usr/lib/apt/methods/https could not be found.
   E: Failed to fetch 
https://apache.bintray.com/couchdb-deb/dists/stretch/InRelease
   E: Some index files failed to download. They have been ignored, or old ones 
used instead.
   The command '/bin/sh -c apt-get update' returned a non-zero code: 100
   ```
   
   ## Possible Solution
   
   
   
   Do not include `apt-transport-https` as one of the `buildDeps` at 
https://github.com/apache/couchdb-docker/blob/f429c1ccf22fe8cf7717383462fbf2f56e6d0301/2.2.0/Dockerfile#L114.
 
   
   ## Steps to Reproduce (for bugs)
   
   
   
   ```
   #!/bin/sh
   
   cat <<-END-OF-DOCKERFILE > Dockerfile.couchdb_extension
   FROM couchdb:2.2.0
   RUN apt-get update
   END-OF-DOCKERFILE
   
   docker build -f Dockerfile.couchdb_extension .
   ```
   
   ## Context
   
   
   
   I'm trying to create a custom docker image based on the official 
couchdb:2.2.0 docker image. My image will have custom couchdb config and run an 
additional application in the background.
   
   ## Your Environment
   
   * Version used: `Docker version 18.06.1-ce, build e68fc7a`
   * Browser Name and version: n/a
   * Operating System and version (desktop or mobile): macOS High Sierra
   * Link to your project: n/a
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nickva opened a new pull request #1597: Allow disabling off-heap messages

2018-09-06 Thread GitBox
nickva opened a new pull request #1597: Allow disabling off-heap messages
URL: https://github.com/apache/couchdb/pull/1597
 
 
   ### Description
   
   Off-heap messages is an Erlang 19 feature:
   
   http://erlang.org/doc/man/erlang.html#process_flag_message_queue_data
   
   It is adviseable to use that setting for processes which expect to receive a
   lot of messages. CouchDB sets it for couch_server, couch_log_server and bunch
   of others as well.
   
   In some cases the off-heap behavior could alter the timing of message 
receives
   and expose subtle bugs that have been lurking in the code for years. Or could
   slightly reduce performance, so a safety measure allow disabling it.
   
   ### How to test
   
   ```
   > config:set("couchdb","enable_off_heap_messages", "false").
   
   > Pid = whereis(couch_server).
   <0.311.0>
   
   > exit(Pid, kill).
   
   > erlang:process_info(whereis(couch_server), message_queue_data).
   {message_queue_data,on_heap}
   
   ```
   
   ### Related
   
https://github.com/apache/couchdb/pull/1392
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nickva commented on issue #1597: Allow disabling off-heap messages

2018-09-06 Thread GitBox
nickva commented on issue #1597: Allow disabling off-heap messages
URL: https://github.com/apache/couchdb/pull/1597#issuecomment-419253500
 
 
   Good idea, @davisp 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nickva closed pull request #1597: Allow disabling off-heap messages

2018-09-06 Thread GitBox
nickva closed pull request #1597: Allow disabling off-heap messages
URL: https://github.com/apache/couchdb/pull/1597
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] davisp commented on issue #1593: Couch server improvements

2018-09-06 Thread GitBox
davisp commented on issue #1593: Couch server improvements
URL: https://github.com/apache/couchdb/pull/1593#issuecomment-419262317
 
 
   Another thing we should consider is sending the Timeout value in the open 
messages so that couch_server can discard any requests from a client that's 
already wandered off.
   
   
https://github.com/apache/couchdb/blob/4afed4e386f3f8f1908ba989f49aaff506177be4/src/couch/src/couch_server.erl#L89


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] davisp commented on issue #1593: Couch server improvements

2018-09-06 Thread GitBox
davisp commented on issue #1593: Couch server improvements
URL: https://github.com/apache/couchdb/pull/1593#issuecomment-419262745
 
 
   And by timeout, I mean we should send an expiry in the message. So something 
like "now + Timestamp" and then couch_server can throw it away if that's in the 
past.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali closed issue #104: Unable to use 2.2.0 as base image

2018-09-06 Thread GitBox
wohali closed issue #104: Unable to use 2.2.0 as base image
URL: https://github.com/apache/couchdb-docker/issues/104
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wohali commented on issue #104: Unable to use 2.2.0 as base image

2018-09-06 Thread GitBox
wohali commented on issue #104: Unable to use 2.2.0 as base image
URL: https://github.com/apache/couchdb-docker/issues/104#issuecomment-419303066
 
 
   Hi there, duplicate of #101. Our very next version of this will correct the 
problem. If you need to have a workaround sooner, you can build the couchdb 
image using the files from this repository yourself. Sorry for the 
inconvenience.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sklassen commented on issue #27: vm.args is now editable -- enabling clusters; snap set works for name and cookie

2018-09-06 Thread GitBox
sklassen commented on issue #27: vm.args is now editable -- enabling clusters; 
snap set works for name and cookie 
URL: https://github.com/apache/couchdb-pkg/pull/27#issuecomment-419307987
 
 
   Hi @wohali  Two questions.
   
   Are there code names for database formats? For example the 1.7x file format 
is different from the 2.2; but is 2.0 and 2.2 file format different? I've 
created a sub directory in the common folder so that the snap can host multiple 
database formats.
   
   I also want to test this snap from the Ubuntu store. Should I do this under 
my own login?
   
   Thanks
   Simon


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services