[jira] Commented: (COUCHDB-767) do a non-blocking file:sync

2010-06-08 Thread Adam Kocoloski (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876733#action_12876733
 ] 

Adam Kocoloski commented on COUCHDB-767:


Hi Randall, I think there are a couple of other places where we need a 
couch_file:rename -- view index compaction, and the case in couch_db.erl where 
only a .compact file is hanging around.

I think you have a typo in the gen_server response to the rename call.  It 
should be something like {reply, ok, File#file{name=Name}}, right?


> do a non-blocking file:sync
> ---
>
> Key: COUCHDB-767
> URL: https://issues.apache.org/jira/browse/COUCHDB-767
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Affects Versions: 0.11
>Reporter: Adam Kocoloski
> Fix For: 1.1
>
> Attachments: 767-async-fsync.patch, async_fsync.patch
>
>
> I've been taking a close look at couch_file performance in our production 
> systems.  One of things I've noticed is that reads are occasionally blocked 
> for a long time by a slow call to file:sync.  I think this is unnecessary.  I 
> think we could do something like
> handle_call(sync, From, #file{name=Name}=File) ->
> spawn_link(fun() -> sync_file(Name, From) end),
> {noreply, File};
> and then
> sync_file(Name, From) ->
> {ok, Fd} = file:open(Name, [read, raw]),
> gen_server:reply(From, file:sync(Fd)),
> file:close(Fd).
> Does anyone see a downside to this?  Individual clients of couch_file still 
> see exactly the same behavior as before, only readers are not blocked by 
> syncs initiated in the db_updater process.  When data needs to be flushed 
> file:sync is _much_ slower than spawning a local process and opening the file 
> again --  in the neighborhood of 1000x slower even on Linux with its 
> less-than-durable use of vanilla fsync.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-767) do a non-blocking file:sync

2010-06-08 Thread Adam Kocoloski (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876727#action_12876727
 ] 

Adam Kocoloski commented on COUCHDB-767:


Hi Damien, the link you sent refers to writes.  Readers can definitely make 
progress during an fsync.

> do a non-blocking file:sync
> ---
>
> Key: COUCHDB-767
> URL: https://issues.apache.org/jira/browse/COUCHDB-767
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Affects Versions: 0.11
>Reporter: Adam Kocoloski
> Fix For: 1.1
>
> Attachments: 767-async-fsync.patch, async_fsync.patch
>
>
> I've been taking a close look at couch_file performance in our production 
> systems.  One of things I've noticed is that reads are occasionally blocked 
> for a long time by a slow call to file:sync.  I think this is unnecessary.  I 
> think we could do something like
> handle_call(sync, From, #file{name=Name}=File) ->
> spawn_link(fun() -> sync_file(Name, From) end),
> {noreply, File};
> and then
> sync_file(Name, From) ->
> {ok, Fd} = file:open(Name, [read, raw]),
> gen_server:reply(From, file:sync(Fd)),
> file:close(Fd).
> Does anyone see a downside to this?  Individual clients of couch_file still 
> see exactly the same behavior as before, only readers are not blocked by 
> syncs initiated in the db_updater process.  When data needs to be flushed 
> file:sync is _much_ slower than spawning a local process and opening the file 
> again --  in the neighborhood of 1000x slower even on Linux with its 
> less-than-durable use of vanilla fsync.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-767) do a non-blocking file:sync

2010-06-08 Thread Damien Katz (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876722#action_12876722
 ] 

Damien Katz commented on COUCHDB-767:
-

The Fsync on a separate thread/process might not work. Definitely load test the 
patch to ensure its giving you what your expect.

http://antirez.com/post/fsync-different-thread-useless.html

> do a non-blocking file:sync
> ---
>
> Key: COUCHDB-767
> URL: https://issues.apache.org/jira/browse/COUCHDB-767
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Affects Versions: 0.11
>Reporter: Adam Kocoloski
> Fix For: 1.1
>
> Attachments: 767-async-fsync.patch, async_fsync.patch
>
>
> I've been taking a close look at couch_file performance in our production 
> systems.  One of things I've noticed is that reads are occasionally blocked 
> for a long time by a slow call to file:sync.  I think this is unnecessary.  I 
> think we could do something like
> handle_call(sync, From, #file{name=Name}=File) ->
> spawn_link(fun() -> sync_file(Name, From) end),
> {noreply, File};
> and then
> sync_file(Name, From) ->
> {ok, Fd} = file:open(Name, [read, raw]),
> gen_server:reply(From, file:sync(Fd)),
> file:close(Fd).
> Does anyone see a downside to this?  Individual clients of couch_file still 
> see exactly the same behavior as before, only readers are not blocked by 
> syncs initiated in the db_updater process.  When data needs to be flushed 
> file:sync is _much_ slower than spawning a local process and opening the file 
> again --  in the neighborhood of 1000x slower even on Linux with its 
> less-than-durable use of vanilla fsync.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-767) do a non-blocking file:sync

2010-06-08 Thread Randall Leeds (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876573#action_12876573
 ] 

Randall Leeds commented on COUCHDB-767:
---

I should also mention that Adam told me about a bug in the original patch on 
IRC: the couch_db gen_server process for the compacted file needs to be 
notified of the name change when compaction completes. This patch adds the 
necessary couch_file:rename/2, called from couch_db_updater/commit_data/2.

> do a non-blocking file:sync
> ---
>
> Key: COUCHDB-767
> URL: https://issues.apache.org/jira/browse/COUCHDB-767
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Affects Versions: 0.11
>Reporter: Adam Kocoloski
> Fix For: 1.1
>
> Attachments: 767-async-fsync.patch, async_fsync.patch
>
>
> I've been taking a close look at couch_file performance in our production 
> systems.  One of things I've noticed is that reads are occasionally blocked 
> for a long time by a slow call to file:sync.  I think this is unnecessary.  I 
> think we could do something like
> handle_call(sync, From, #file{name=Name}=File) ->
> spawn_link(fun() -> sync_file(Name, From) end),
> {noreply, File};
> and then
> sync_file(Name, From) ->
> {ok, Fd} = file:open(Name, [read, raw]),
> gen_server:reply(From, file:sync(Fd)),
> file:close(Fd).
> Does anyone see a downside to this?  Individual clients of couch_file still 
> see exactly the same behavior as before, only readers are not blocked by 
> syncs initiated in the db_updater process.  When data needs to be flushed 
> file:sync is _much_ slower than spawning a local process and opening the file 
> again --  in the neighborhood of 1000x slower even on Linux with its 
> less-than-durable use of vanilla fsync.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (COUCHDB-767) do a non-blocking file:sync

2010-05-19 Thread Adam Kocoloski (JIRA)

[ 
https://issues.apache.org/jira/browse/COUCHDB-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869147#action_12869147
 ] 

Adam Kocoloski commented on COUCHDB-767:


After chatting with rnewson on IRC I realized I should clarify a couple of 
things

1) I don't mean to say that the fsync got 1000x faster -- that would be a clear 
indication it wasn't doing its job.  What I mean to say is that the time to 
spawn the process which does the sync is 1000x less than the time to actually 
do the sync.  So from the perspective of the couch_file server process, 
spawning a worker is a net win.  The actual fsync still took 30 ms or more.

2) We need to open the file again because Erlang raw files can only be used by 
the process that opened them.  Even if that was not a rule, I think we'd want 
to do it anyway to get a separate file descriptor so that reads don't block.

3) I prefer opening the file each time we want to sync because fds can be a 
valuable resource.  Taking two for each DB seems wasteful.  It takes < 1 ms to 
open the file again.

I ran a relaximation test with delayed_commits = false on my OS X laptop.  The 
results are not terribly conclusive, especially without having measurement 
errors on the data points.  If you squint you might say that the patch improved 
the average reader response time:

http://mikeal.couchone.com/graphs/_design/app/_show/compareWriteReadTest/259e81ea09dd4173298a4431fb001d61

I'd love to see more scientific measurements (variance, median, X% 
distribtions, etc.) if someone with node hacking skills is keen to contribute 
to that project.

Qualtiatively I'd say that the patch makes a big difference in the long tail 
for readers, though.  I did some primitive slow query logging on one of our 
production boxes, basically doing now_diff() calls around couch_btree:get_node 
and other calls to read terms from couch_file.  I also logged fsyncs which took 
longer than 100 ms.  Whereas before a slow fsync would block the couch_file and 
a bunch of pread_iolist calls would get stuck behind it, with the patch the 
only time I saw slow document read times was when the actual pread call itself 
was slow.  I saw plenty of instances where the fsync for a file took 100s of ms 
but no slow reads were logged.

> do a non-blocking file:sync
> ---
>
> Key: COUCHDB-767
> URL: https://issues.apache.org/jira/browse/COUCHDB-767
> Project: CouchDB
>  Issue Type: Improvement
>  Components: Database Core
>Affects Versions: 0.11
>Reporter: Adam Kocoloski
> Fix For: 1.1
>
>
> I've been taking a close look at couch_file performance in our production 
> systems.  One of things I've noticed is that reads are occasionally blocked 
> for a long time by a slow call to file:sync.  I think this is unnecessary.  I 
> think we could do something like
> handle_call(sync, From, #file{name=Name}=File) ->
> spawn_link(fun() -> sync_file(Name, From) end),
> {noreply, File};
> and then
> sync_file(Name, From) ->
> {ok, Fd} = file:open(Name, [read, raw]),
> gen_server:reply(From, file:sync(Fd)),
> file:close(Fd).
> Does anyone see a downside to this?  Individual clients of couch_file still 
> see exactly the same behavior as before, only readers are not blocked by 
> syncs initiated in the db_updater process.  When data needs to be flushed 
> file:sync is _much_ slower than spawning a local process and opening the file 
> again --  in the neighborhood of 1000x slower even on Linux with its 
> less-than-durable use of vanilla fsync.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.