rewrite security
Devs, Today I wrote a patch and backported to 0.11.x, concerning the security guarantees made by the rewriter. Previously we'd allowed the rewriter to include as many ".." segments in the target path as the developer wanted. This is great for flexibility, and allows you to create a vhost which doesn't provide the full couchdb api but also can access more than one database. I've changed the default behavior so that you can only have a maximum of 2 ".." segments in your rewrite paths, so eg you can rewrite from /db/_design/foo/_rewrite to /db/ or /db/somedoc or /db/_design/foo/_show/bar but you CAN'T rewrite to /otherdb/anything This allows databases to be "jailed" inside vhosts, which is very important for security as we move forward. Prior to this patch, there was no way to run untrusted apps on the same CouchDB VM as each other. Now, using vhosts and by putting apps/dbs on different domains, we can protect the applications from each other. My patch maintains the previous functionality (unlimited rewriting), but it is configured to be off by default. By default secure_rewrites=true. So if you are currently relying on cross-db rewrites, you'll need to edit your local.ini file to set secure_rewrites=false There are some missing parts in my implementation. I plan to finish this change by making a configurable whitelist of global_httpd handlers that *can* be accessed within vhosts. The ones I can think of are: _utils _uuids _session _oauth _users And some installations might want to turn on others (_replicate, _stats, etc), hence the list should be configurable. My plan is to implement this as a whitelist that is checked by the vhost engine, so that these global handlers are available on all vhosts. This way you can easily add a database that needs to be available to all your vhosts using a single configuration directive. I think this plan is sane, but please let me know if you see issues that I'm missing. Chris
[jira] Commented: (COUCHDB-754) Improve couch_file write performance
[ https://issues.apache.org/jira/browse/COUCHDB-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864542#action_12864542 ] Adam Kocoloski commented on COUCHDB-754: Oh, another comment about O_(D)SYNC -- I don't think it works properly on NFS. Perhaps a good thing to do here is to borrow the wal_sync_method options from Postgres: http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm so to use O_DSYNC we would have [couchdb] sync_method = open_datasync ; supply O_DSYNC when opening the file > Improve couch_file write performance > > > Key: COUCHDB-754 > URL: https://issues.apache.org/jira/browse/COUCHDB-754 > Project: CouchDB > Issue Type: Improvement > Environment: some code might be platform-specific >Reporter: Adam Kocoloski > Fix For: 1.1 > > Attachments: cheaper-appending-v2.patch, cheaper-appending.patch > > > I've got a number of possible enhancements to couch_file floating around in > my head, wanted to write them down. > * Use fdatasync instead of fsync. Filipe posted a patch to the OTP file > driver [1] that adds a new file:datasync/1 function. I suspect that we won't > see much of a performance gain from this switch because we append to the file > and thus need to update the file metedata anyway. On the other hand, I'm > fairly certain fdatasync is always safe for our needs, so if it is ever more > efficient we should use it. Obviously, we'll need to fall back to > file:sync/1 on platforms where the datasync function is not available. > * Use file:pwrite/2 to batch together multiple outstanding write requests. > This is essentially Paul's zip_server [2]. In order to take full advantage > of it we need to patch couch_btree to update nodes in parallel. Currently > there should only be 1 outstanding write request in a couch_file at a time, > so it wouldn't help at all. > * Open the file in append mode and stop seeking to eof in user space. We > never modify files (aside from truncating, which is rare enough to be handled > separately), so perhaps it would help with performance if we let the kernel > deal with the seek. We'd still need a way to get the file size for the > make_blocks function. I'm wondering if file:read_file_info(Fd) is more > efficient than file:position(Fd, eof) for this purpose. > A caveat - I'm not sure if append-only files are compatible with the previous > enhancement. There is no file:write/2, and I have no idea how file:pwrite > behaves on a file which is opened append-only. Is the Pos ignored, or is it > an error? Will have to test. > * Use O_DSYNC instead of fsync/fdatasync. This one is inspired by antirez' > recent blog post [3] and some historical discussions on pgsql-performance. > Basically, it seems that opening a file with O_DSYNC (or O_SYNC on Linux, > which is currently the same thing) and doing all synchronous writes is > reasonably fast. Antirez' tests showed 250 µs delays for (tiny) synchronous > writes, compared to 40 ms delays for fsync and fdatasync on his ext4 system. > At the very least, this looks to be a compelling choice for file access when > the server is running with delayed_commits = true. We'd need to patch the > OTP file driver again, and also investigate the cross-platform support. In > particular, I don't think it works on NFS. > [1]: http://github.com/fdmanana/otp/tree/fdatasync > [2]: http://github.com/davisp/zip_server > [3]: http://antirez.com/post/fsync-different-thread-useless.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-754) Improve couch_file write performance
[ https://issues.apache.org/jira/browse/COUCHDB-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864518#action_12864518 ] Adam Kocoloski commented on COUCHDB-754: Hi Sebastian, not OT at all. I believe you've got the axes right. The numbers at the bottom represent the average response time divided by the number of clients. > Improve couch_file write performance > > > Key: COUCHDB-754 > URL: https://issues.apache.org/jira/browse/COUCHDB-754 > Project: CouchDB > Issue Type: Improvement > Environment: some code might be platform-specific >Reporter: Adam Kocoloski > Fix For: 1.1 > > Attachments: cheaper-appending-v2.patch, cheaper-appending.patch > > > I've got a number of possible enhancements to couch_file floating around in > my head, wanted to write them down. > * Use fdatasync instead of fsync. Filipe posted a patch to the OTP file > driver [1] that adds a new file:datasync/1 function. I suspect that we won't > see much of a performance gain from this switch because we append to the file > and thus need to update the file metedata anyway. On the other hand, I'm > fairly certain fdatasync is always safe for our needs, so if it is ever more > efficient we should use it. Obviously, we'll need to fall back to > file:sync/1 on platforms where the datasync function is not available. > * Use file:pwrite/2 to batch together multiple outstanding write requests. > This is essentially Paul's zip_server [2]. In order to take full advantage > of it we need to patch couch_btree to update nodes in parallel. Currently > there should only be 1 outstanding write request in a couch_file at a time, > so it wouldn't help at all. > * Open the file in append mode and stop seeking to eof in user space. We > never modify files (aside from truncating, which is rare enough to be handled > separately), so perhaps it would help with performance if we let the kernel > deal with the seek. We'd still need a way to get the file size for the > make_blocks function. I'm wondering if file:read_file_info(Fd) is more > efficient than file:position(Fd, eof) for this purpose. > A caveat - I'm not sure if append-only files are compatible with the previous > enhancement. There is no file:write/2, and I have no idea how file:pwrite > behaves on a file which is opened append-only. Is the Pos ignored, or is it > an error? Will have to test. > * Use O_DSYNC instead of fsync/fdatasync. This one is inspired by antirez' > recent blog post [3] and some historical discussions on pgsql-performance. > Basically, it seems that opening a file with O_DSYNC (or O_SYNC on Linux, > which is currently the same thing) and doing all synchronous writes is > reasonably fast. Antirez' tests showed 250 µs delays for (tiny) synchronous > writes, compared to 40 ms delays for fsync and fdatasync on his ext4 system. > At the very least, this looks to be a compelling choice for file access when > the server is running with delayed_commits = true. We'd need to patch the > OTP file driver again, and also investigate the cross-platform support. In > particular, I don't think it works on NFS. > [1]: http://github.com/fdmanana/otp/tree/fdatasync > [2]: http://github.com/davisp/zip_server > [3]: http://antirez.com/post/fsync-different-thread-useless.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-754) Improve couch_file write performance
[ https://issues.apache.org/jira/browse/COUCHDB-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864492#action_12864492 ] Sebastian Cohnen commented on COUCHDB-754: -- sorry for being OT here, but what does this graph show on its axes? time on x in sec? response time in ms on the y-axis? mikeal's readme wasn't clear on this too and I'm just curious :) > Improve couch_file write performance > > > Key: COUCHDB-754 > URL: https://issues.apache.org/jira/browse/COUCHDB-754 > Project: CouchDB > Issue Type: Improvement > Environment: some code might be platform-specific >Reporter: Adam Kocoloski > Fix For: 1.1 > > Attachments: cheaper-appending-v2.patch, cheaper-appending.patch > > > I've got a number of possible enhancements to couch_file floating around in > my head, wanted to write them down. > * Use fdatasync instead of fsync. Filipe posted a patch to the OTP file > driver [1] that adds a new file:datasync/1 function. I suspect that we won't > see much of a performance gain from this switch because we append to the file > and thus need to update the file metedata anyway. On the other hand, I'm > fairly certain fdatasync is always safe for our needs, so if it is ever more > efficient we should use it. Obviously, we'll need to fall back to > file:sync/1 on platforms where the datasync function is not available. > * Use file:pwrite/2 to batch together multiple outstanding write requests. > This is essentially Paul's zip_server [2]. In order to take full advantage > of it we need to patch couch_btree to update nodes in parallel. Currently > there should only be 1 outstanding write request in a couch_file at a time, > so it wouldn't help at all. > * Open the file in append mode and stop seeking to eof in user space. We > never modify files (aside from truncating, which is rare enough to be handled > separately), so perhaps it would help with performance if we let the kernel > deal with the seek. We'd still need a way to get the file size for the > make_blocks function. I'm wondering if file:read_file_info(Fd) is more > efficient than file:position(Fd, eof) for this purpose. > A caveat - I'm not sure if append-only files are compatible with the previous > enhancement. There is no file:write/2, and I have no idea how file:pwrite > behaves on a file which is opened append-only. Is the Pos ignored, or is it > an error? Will have to test. > * Use O_DSYNC instead of fsync/fdatasync. This one is inspired by antirez' > recent blog post [3] and some historical discussions on pgsql-performance. > Basically, it seems that opening a file with O_DSYNC (or O_SYNC on Linux, > which is currently the same thing) and doing all synchronous writes is > reasonably fast. Antirez' tests showed 250 µs delays for (tiny) synchronous > writes, compared to 40 ms delays for fsync and fdatasync on his ext4 system. > At the very least, this looks to be a compelling choice for file access when > the server is running with delayed_commits = true. We'd need to patch the > OTP file driver again, and also investigate the cross-platform support. In > particular, I don't think it works on NFS. > [1]: http://github.com/fdmanana/otp/tree/fdatasync > [2]: http://github.com/davisp/zip_server > [3]: http://antirez.com/post/fsync-different-thread-useless.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-754) Improve couch_file write performance
[ https://issues.apache.org/jira/browse/COUCHDB-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864486#action_12864486 ] Adam Kocoloski commented on COUCHDB-754: I wrote a fast and dirty patch to set the O_DSYNC flag via couch_icu_driver instead of calling fsync. I'll submit a cleaned-up version that only activates when delayed_commits = false later. Here are the results of a relaximation writer comparison test against tr...@940992: http://mikeal.couchone.com/graphs/_design/app/_show/compareWriteTest/c34d5d47f99e11be1f591832d0004d64 So clearly the O_DSYNC approach is much faster than calling file:sync/1 after every write. I confirmed that the fcntl actually did have an effect; append_bin operations with O_DSYNC set were taking ~600 µs as opposed to ~100 µs without the flag on trunk. I have no idea what kind of data integrity guarantees we get with O_DSYNC on OS X. Is it equivalent to an fsync(), or to an fcntl(F_FULLFSYNC)? If its equivalent to an fcntl(F_FULLFSYNC) this is a no-brainer. It's also a no-brainer on Linux. > Improve couch_file write performance > > > Key: COUCHDB-754 > URL: https://issues.apache.org/jira/browse/COUCHDB-754 > Project: CouchDB > Issue Type: Improvement > Environment: some code might be platform-specific >Reporter: Adam Kocoloski > Fix For: 1.1 > > Attachments: cheaper-appending-v2.patch, cheaper-appending.patch > > > I've got a number of possible enhancements to couch_file floating around in > my head, wanted to write them down. > * Use fdatasync instead of fsync. Filipe posted a patch to the OTP file > driver [1] that adds a new file:datasync/1 function. I suspect that we won't > see much of a performance gain from this switch because we append to the file > and thus need to update the file metedata anyway. On the other hand, I'm > fairly certain fdatasync is always safe for our needs, so if it is ever more > efficient we should use it. Obviously, we'll need to fall back to > file:sync/1 on platforms where the datasync function is not available. > * Use file:pwrite/2 to batch together multiple outstanding write requests. > This is essentially Paul's zip_server [2]. In order to take full advantage > of it we need to patch couch_btree to update nodes in parallel. Currently > there should only be 1 outstanding write request in a couch_file at a time, > so it wouldn't help at all. > * Open the file in append mode and stop seeking to eof in user space. We > never modify files (aside from truncating, which is rare enough to be handled > separately), so perhaps it would help with performance if we let the kernel > deal with the seek. We'd still need a way to get the file size for the > make_blocks function. I'm wondering if file:read_file_info(Fd) is more > efficient than file:position(Fd, eof) for this purpose. > A caveat - I'm not sure if append-only files are compatible with the previous > enhancement. There is no file:write/2, and I have no idea how file:pwrite > behaves on a file which is opened append-only. Is the Pos ignored, or is it > an error? Will have to test. > * Use O_DSYNC instead of fsync/fdatasync. This one is inspired by antirez' > recent blog post [3] and some historical discussions on pgsql-performance. > Basically, it seems that opening a file with O_DSYNC (or O_SYNC on Linux, > which is currently the same thing) and doing all synchronous writes is > reasonably fast. Antirez' tests showed 250 µs delays for (tiny) synchronous > writes, compared to 40 ms delays for fsync and fdatasync on his ext4 system. > At the very least, this looks to be a compelling choice for file access when > the server is running with delayed_commits = true. We'd need to patch the > OTP file driver again, and also investigate the cross-platform support. In > particular, I don't think it works on NFS. > [1]: http://github.com/fdmanana/otp/tree/fdatasync > [2]: http://github.com/davisp/zip_server > [3]: http://antirez.com/post/fsync-different-thread-useless.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (COUCHDB-759) rewriter should be securely jailed in a single database by default
rewriter should be securely jailed in a single database by default -- Key: COUCHDB-759 URL: https://issues.apache.org/jira/browse/COUCHDB-759 Project: CouchDB Issue Type: Bug Reporter: Chris Anderson This will allow us to isolate databases using vhosts and the browser's single-origin policy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (COUCHDB-754) Improve couch_file write performance
[ https://issues.apache.org/jira/browse/COUCHDB-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864382#action_12864382 ] Adam Kocoloski commented on COUCHDB-754: Regarding the O_(D)SYNC flag, I noticed that bitcask relies on the undocumented structure of an Erlang file opened in raw mode to get the real FD {file_descriptor, prim_file, {_Port, RealFd}} = FD and then they have a NIF to call fcntl and add O_SYNC to flags for that fd. I suppose we could do the same without a NIF (given that we are supporting older Erlang VMs) > Improve couch_file write performance > > > Key: COUCHDB-754 > URL: https://issues.apache.org/jira/browse/COUCHDB-754 > Project: CouchDB > Issue Type: Improvement > Environment: some code might be platform-specific >Reporter: Adam Kocoloski > Fix For: 1.1 > > Attachments: cheaper-appending-v2.patch, cheaper-appending.patch > > > I've got a number of possible enhancements to couch_file floating around in > my head, wanted to write them down. > * Use fdatasync instead of fsync. Filipe posted a patch to the OTP file > driver [1] that adds a new file:datasync/1 function. I suspect that we won't > see much of a performance gain from this switch because we append to the file > and thus need to update the file metedata anyway. On the other hand, I'm > fairly certain fdatasync is always safe for our needs, so if it is ever more > efficient we should use it. Obviously, we'll need to fall back to > file:sync/1 on platforms where the datasync function is not available. > * Use file:pwrite/2 to batch together multiple outstanding write requests. > This is essentially Paul's zip_server [2]. In order to take full advantage > of it we need to patch couch_btree to update nodes in parallel. Currently > there should only be 1 outstanding write request in a couch_file at a time, > so it wouldn't help at all. > * Open the file in append mode and stop seeking to eof in user space. We > never modify files (aside from truncating, which is rare enough to be handled > separately), so perhaps it would help with performance if we let the kernel > deal with the seek. We'd still need a way to get the file size for the > make_blocks function. I'm wondering if file:read_file_info(Fd) is more > efficient than file:position(Fd, eof) for this purpose. > A caveat - I'm not sure if append-only files are compatible with the previous > enhancement. There is no file:write/2, and I have no idea how file:pwrite > behaves on a file which is opened append-only. Is the Pos ignored, or is it > an error? Will have to test. > * Use O_DSYNC instead of fsync/fdatasync. This one is inspired by antirez' > recent blog post [3] and some historical discussions on pgsql-performance. > Basically, it seems that opening a file with O_DSYNC (or O_SYNC on Linux, > which is currently the same thing) and doing all synchronous writes is > reasonably fast. Antirez' tests showed 250 µs delays for (tiny) synchronous > writes, compared to 40 ms delays for fsync and fdatasync on his ext4 system. > At the very least, this looks to be a compelling choice for file access when > the server is running with delayed_commits = true. We'd need to patch the > OTP file driver again, and also investigate the cross-platform support. In > particular, I don't think it works on NFS. > [1]: http://github.com/fdmanana/otp/tree/fdatasync > [2]: http://github.com/davisp/zip_server > [3]: http://antirez.com/post/fsync-different-thread-useless.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-758) Remote replication fails for any database, push or pull.
[ https://issues.apache.org/jira/browse/COUCHDB-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Hunt updated COUCHDB-758: --- Summary: Remote replication fails for any database, push or pull. (was: Remove replication fails for any database, push or pull.) > Remote replication fails for any database, push or pull. > > > Key: COUCHDB-758 > URL: https://issues.apache.org/jira/browse/COUCHDB-758 > Project: CouchDB > Issue Type: Bug > Components: Replication >Affects Versions: 0.11 > Environment: Windows >Reporter: Chris Hunt > Attachments: CouchReplLog.txt > > > All attempts to replicate fail in either direction for any database. The > message is "Replication failed: undefined" -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-758) Remove replication fails for any database, push or pull.
[ https://issues.apache.org/jira/browse/COUCHDB-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Hunt updated COUCHDB-758: --- Attachment: CouchReplLog.txt > Remove replication fails for any database, push or pull. > > > Key: COUCHDB-758 > URL: https://issues.apache.org/jira/browse/COUCHDB-758 > Project: CouchDB > Issue Type: Bug > Components: Replication >Affects Versions: 0.11 > Environment: Windows >Reporter: Chris Hunt > Attachments: CouchReplLog.txt > > > All attempts to replicate fail in either direction for any database. The > message is "Replication failed: undefined" -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (COUCHDB-758) Remove replication fails for any database, push or pull.
Remove replication fails for any database, push or pull. Key: COUCHDB-758 URL: https://issues.apache.org/jira/browse/COUCHDB-758 Project: CouchDB Issue Type: Bug Components: Replication Affects Versions: 0.11 Environment: Windows Reporter: Chris Hunt All attempts to replicate fail in either direction for any database. The message is "Replication failed: undefined" -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: buildbot failure in ASF Buildbot on couchdb-trunk
I already alerted gmcdonald to that. Not sure what the underlying cause is, but its happened before. On Wed, May 5, 2010 at 7:17 AM, Adam Kocoloski wrote: > Thanks Paul, I tried that, but it looks like the buildbot failed to clean up > after itself from the previous attempt: > > http://ci.apache.org/builders/couchdb-trunk/builds/308/steps/svn/logs/stdio > > What's the next step? > > On May 4, 2010, at 10:59 PM, Paul Davis wrote: > >> Theoretically, on IRC, its: >> >> couchbot: force build ${BUILDER} >> >> Where ${BUILDER} can currently be couch-trunk or couch-cover. >> >> I know of spurious errors when both builds run at the same time due to >> our tests not using random ports when we start a full server. (Both >> try and grab port 5984, one fails) >> >> Though currently, couchbot has gone missing. I think there's an email >> endpoint but I don't know it off the top of my head. >> >> HTH, >> Paul Davis >> >> On Tue, May 4, 2010 at 10:10 PM, Adam Kocoloski wrote: >>> How do I force another build? I can't reproduce the test failure locally. >>> >>> Adam >>> >>> On May 4, 2010, at 4:59 PM, build...@apache.org wrote: >>> The Buildbot has detected a new failure of couchdb-trunk on ASF Buildbot. Full details are available at: http://ci.apache.org/builders/couchdb-trunk/builds/306 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: bb-vm_ubuntu Build Reason: Build Source Stamp: [branch couchdb/trunk] 941033 Blamelist: kocolosk BUILD FAILED: failed compile_5 sincerely, -The ASF Buildbot >>> >>> > >
Re: buildbot failure in ASF Buildbot on couchdb-trunk
Thanks Paul, I tried that, but it looks like the buildbot failed to clean up after itself from the previous attempt: http://ci.apache.org/builders/couchdb-trunk/builds/308/steps/svn/logs/stdio What's the next step? On May 4, 2010, at 10:59 PM, Paul Davis wrote: > Theoretically, on IRC, its: > > couchbot: force build ${BUILDER} > > Where ${BUILDER} can currently be couch-trunk or couch-cover. > > I know of spurious errors when both builds run at the same time due to > our tests not using random ports when we start a full server. (Both > try and grab port 5984, one fails) > > Though currently, couchbot has gone missing. I think there's an email > endpoint but I don't know it off the top of my head. > > HTH, > Paul Davis > > On Tue, May 4, 2010 at 10:10 PM, Adam Kocoloski wrote: >> How do I force another build? I can't reproduce the test failure locally. >> >> Adam >> >> On May 4, 2010, at 4:59 PM, build...@apache.org wrote: >> >>> The Buildbot has detected a new failure of couchdb-trunk on ASF Buildbot. >>> Full details are available at: >>> http://ci.apache.org/builders/couchdb-trunk/builds/306 >>> >>> Buildbot URL: http://ci.apache.org/ >>> >>> Buildslave for this Build: bb-vm_ubuntu >>> >>> Build Reason: >>> Build Source Stamp: [branch couchdb/trunk] 941033 >>> Blamelist: kocolosk >>> >>> BUILD FAILED: failed compile_5 >>> >>> sincerely, >>> -The ASF Buildbot >>> >> >>