+1 -- ,,,^..^,,,
On Wed, May 8, 2013 at 11:20 PM, Garren Smith <gar...@apache.org> wrote: > +1 > > On 08 May 2013, at 7:04 PM, Joan Touzet <woh...@apache.org> wrote: > >> +1 >> >> On Tue, May 07, 2013 at 04:02:09PM -0500, Paul Davis wrote: >>> +1 >>> >>> On Tue, May 7, 2013 at 3:52 PM, Russell Branca <chewbra...@gmail.com> wrote: >>>> +1 >>>> >>>> >>>> Very excited to see this! Great work! >>>> >>>> >>>> -Russell >>>> >>>> >>>> On Tue, May 7, 2013 at 1:44 PM, Robert Newson <rnew...@apache.org> wrote: >>>> >>>>> FYI: A zip of this work is available at >>>>> http://people.apache.org/~rnewson/dist/nebraska-merge-candidate.zip >>>>> made by 'git archive -o nebraska-merge-candidate.zip >>>>> nebraska-merge-candidate' >>>>> >>>>> On 7 May 2013 21:34, Robert Newson <rnew...@apache.org> wrote: >>>>>> Hi All, >>>>>> >>>>>> I propose to merge in the following work, >>>>>> https://github.com/rnewson/couchdb/tree/nebraska-merge-candidate to >>>>>> the official Apache CouchDB repository to a new branch (i.e, *not* >>>>>> master). Once there, the full CouchDB developer community can begin >>>>>> the work to incorporate the code here into an official release. >>>>>> >>>>>> You do not need to respond if you are in agreement. If there is no >>>>>> response in 72 hours, I will assume lazy consensus. If we reach >>>>>> consensus, I will start the IP clearance process and then the merge. >>>>>> >>>>>> As most of you know, Paul Davis and I recently sequestered ourselves >>>>>> away from society (in a place called Nebraska) to make this merge >>>>>> happen. I want to clarify that this work is not the BigCouch code you >>>>>> can see on github.com/cloudant/bigcouch but the Cloudant platform from >>>>>> which BigCouch was made. This means it is bang up to date with all the >>>>>> bug fixes and feature enhancements we've made in the last eighteen >>>>>> months or more. With that clarification made, here are our notes about >>>>>> what we achieved, what it means to the project and what isn't yet >>>>>> done; >>>>>> >>>>>> Nebraska Merge Roundup >>>>>> >>>>>> >>>>>> Stats: >>>>>> >>>>>> >>>>>> 1402 - total new commits >>>>>> >>>>>> 312 - commits written during the merge (will be reduced substantially >>>>>> by squashing) >>>>>> >>>>>> 408 - number of files changed >>>>>> >>>>>> 21,897 - number of lines added >>>>>> >>>>>> 4,277 - number of lines removed >>>>>> >>>>>> A retrospective: >>>>>> >>>>>> Bob Newson and I have come to the end of our merge sprint on getting >>>>>> BigCouch merged into Apache CouchDB. Its been a productive ten days >>>>>> here in the midwest. I managed to get Bob out to a bowling alley and >>>>>> he managed to get me to a sushi restaurant. In between the cultural >>>>>> exchanges we’ve also managed to get a significant amount of work done >>>>>> on the merging as well. >>>>>> >>>>>> >>>>>> The current status of the merge is that we’ve managed to resolve the >>>>>> differences in the single node execution of CouchDB. Both the >>>>>> JavaScript and Erlang test suites run with only one failure in the >>>>>> Erlang test suite due to a (deliberately) missing constraint on the >>>>>> number of operating system processes. This should be a relatively >>>>>> straightforward fix but was not prioritized during our limited time to >>>>>> work on the larger issues. >>>>>> >>>>>> >>>>>> We merged a large number of performance and stability enhancements >>>>>> back into single node CouchDB as well as a number of pure bug fixes. >>>>>> The biggest highlight is a brand new compactor that is both faster and >>>>>> creates smaller and better organized post-compaction databases. >>>>>> >>>>>> >>>>>> The current status of the merge is that single node operations should >>>>>> be completely unaffected as demonstrated by the test suite passing. On >>>>>> the other hand we haven’t yet finished getting the clustered code >>>>>> merged to use some of the new changes in single node CouchDB. The >>>>>> single most significant portion of this work involves updates to the >>>>>> internal cluster API for views to use the recently rewritten indexer >>>>>> APIs. This should be a relatively straightforward bit of work that >>>>>> we’ll be finishing over the next few weeks. >>>>>> >>>>>> >>>>>> All in all the merge work done so far has been quite successful. We’ve >>>>>> met our primary goal of getting the code merged in a fashion that does >>>>>> not affect single node operation while providing a starting point for >>>>>> the larger community to start reviewing the more significant changes >>>>>> made. Given the size of the diff between the two code bases we never >>>>>> expected to have a fully working clustered solution after ten days of >>>>>> work but we have succeeded in providing a base of work that will allow >>>>>> us and new contributors to get up to speed quickly. >>>>>> >>>>>> >>>>>> This work, coupled with work by Dave Cottlehuber and Benoît Chesneau >>>>>> on updating the build system and various other internal updates, will >>>>>> provide a solid foundation for work going forward. Its an exciting >>>>>> time for CouchDB and anyone interested should keep an eye on the next >>>>>> few releases as we ramp up work on various core aspects of the >>>>>> database. >>>>>> >>>>>> >>>>>> We’ve had an exciting few days working to prepare the road for an >>>>>> exciting next twelve to eighteen months. We hope that everyone will >>>>>> feel as excited as we do about the next twelve to eighteen months for >>>>>> Apache CouchDB. It should be an exciting ride. >>>>>> >>>>>> >>>>>> >>>>>> Things we got done >>>>>> >>>>>> >>>>>> * Large update to the source tree layout for Erlang applications. Each >>>>>> application now has a src/appname/(c_src|ebin|priv|src) structure. The >>>>>> build system has been updated. >>>>>> >>>>>> * Renamed src/couchdb to src/couch to match the Erlang convention of >>>>>> the top directory name matching the Erlang application name. >>>>>> >>>>>> * Imported Cloudant Erlang applications for clustered CouchDB. These >>>>>> are imported with their history by using git subtree and merging the >>>>>> top level commit. These are not external deps, development will happen >>>>>> within the CouchDB tree. The imported apps are: >>>>>> >>>>>> >>>>>> * config - A couch_config replacement (Behavior is mostly identical >>>>>> to couch_config except how we listen for configuration changes >>>>>> internally to allow for smooth hot code upgrade). >>>>>> >>>>>> * twig - An rsyslog source replacement for couch_log. >>>>>> >>>>>> * rexi - An RPC library. Replaces Erlang’s built-in rex application >>>>>> to avoid costly safety measures in the interest of performance and >>>>>> throughput. >>>>>> >>>>>> * mem3 - The “Dynamo” part of BigCouch responsible for managing >>>>> cluster state >>>>>> >>>>>> * fabric - The internal cluster-aware CouachDB API >>>>>> >>>>>> * ets_lru - A small library application that provides an LRU >>>>>> implementation using a couple ets tables. >>>>>> >>>>>> * ddoc_cache - Caches design documents on each node for use in >>>>>> design handler functions. This uses an ets_lru cache with a very short >>>>>> TTL. >>>>>> >>>>>> * chttpd - The cluster aware HTTP layer >>>>>> >>>>>> >>>>>> Each imported app also had its build system updated to use Autotools >>>>>> along with the necessary updates noted above for the new application >>>>>> layouts for existing CouchDB erlang apps. >>>>>> >>>>>> >>>>>> * Merged a large amount of updates and fixes to couch_replicator based >>>>>> on work done internally at Cloudant. Unfortunately due to an error >>>>>> when we created our internal clone we lost a bit of history in some of >>>>>> the initial merge and have a big commit that affects >>>>>> couch_replicator_manager mostly. There are a number of other commits >>>>>> related to couch_replicator that resolve the single node vs. clustered >>>>>> differences. Some noticeable couch_replicator features: >>>>>> >>>>>> >>>>>> * Optionally disable checkpoints so that replication can work when >>>>>> a source is read only. This should only be used for smaller databases >>>>>> as each replication call has to scan the entire source database on >>>>>> each invocation. >>>>>> >>>>>> * A new changes_pending field in the _active_tasks output >>>>>> >>>>>> * A fix to the continuous replication to automatically reconnect to >>>>>> a continuous changes feed when it sees a last_seq value. This allows >>>>>> for the source to selectively recycle the HTTP connections used which >>>>>> can be quite useful for “permanent” replications. >>>>>> >>>>>> * A multitude of smaller bug fix and stability enhancements. >>>>>> >>>>>> >>>>>> Updates to single node couch: >>>>>> >>>>>> >>>>>> * We changed the by_seq tree to store a copy of the #full_doc_info{} >>>>>> record instead of the #doc_info{} record. This gives significant speed >>>>>> improvements for compaction and replication and generally anything >>>>>> that needs to walk the by_seq tree and access document bodies >>>>>> internally. >>>>>> >>>>>> * We rewrote the compactor to be significantly faster as well as >>>>>> provides significantly better compacted databases. The two main halves >>>>>> are to use a temp file and replace the use of btrees in the temp file. >>>>>> The temp file only contains a temporary copy of the document ids. At >>>>>> the end of a compaction run we then rebuild the by_id btree in the >>>>>> compaction file from this temp file. The reason this helps so much is >>>>>> that the compaction is based on the update_seq btree, which for most >>>>>> cases means that the id tree is updated in roughly random order which >>>>>> is very bad for our append only btrees. By using the tmp file we can >>>>>> stream it in order back into the compacted db file at the end of >>>>>> compacting, generating a minimum amount of garbage in the process. The >>>>>> other upgrade was to implement an external merge sort module >>>>>> (couch_emsort) that is used with this temporary file. >>>>>> >>>>>> * Reject updates to design docs that introduce updates that break >>>>>> compilation for source code. Currently we only check map and reduce >>>>>> calls as the other should provide user visible errors instead of >>>>>> inexplicably empty views. >>>>>> >>>>>> because my OCD kicked in and I was unable to resist. >>>>>> >>>>>> * Reverted a change made a long time ago that uses two file >>>>>> descriptors for each database. See the todo list. >>>>>> >>>>>> * The reason to remove the second fd is so that we can rewrite ref >>>>>> counting. Better ref counting makes everyone happy, but the real >>>>>> reason is for this next bullet point: >>>>>> >>>>>> * Optimize couch_server to not require a round trip message pass for >>>>>> opening a database that’s in the LRU. This is a significant >>>>>> performance boost for high concurrency access. We also optimized >>>>>> couch_server internals to not blow up when it’s under load. >>>>>> >>>>>> * Introduce a #leaf{} record into the revision trees. This is never >>>>>> written to disk but makes internal code a lot cleaner when dealing >>>>>> with multiple versions of rev tree values. >>>>>> >>>>>> * Some changes to couch_changes to enable clustered access. Also some >>>>>> general cleanup >>>>>> >>>>>> * Internal changes to how CouchDB is booted in Erlang land. Not very >>>>>> sexy but this removes a lot of complicated un-Erlangy bits. We still >>>>>> have a bit of work left here. >>>>>> >>>>>> * btree chunk sizes are now configurable which can allow people to >>>>>> adjust the RAM/speed tradeoffs a bit more. >>>>>> >>>>>> * We now load update validation functions on the first write. This is >>>>>> a cluster-motivated change because the clustered version of this call >>>>>> is expensive and can lead to race conditions when opening a bunch of >>>>>> db shards simultaneously. This should be invisible to external >>>>>> clients. >>>>>> >>>>>> * Disabled conflict detection for local docs. They don’t replicate so >>>>>> there’s no point. This just led to clusters getting stuck and confused >>>>>> when there were lots of replications happening. >>>>>> >>>>>> * Changes to the multipart/mime parsing code. Necessary for clustered >>>>>> attachment uploads to split the incoming data stream into N copies. >>>>>> >>>>>> * Don’t use init:restart/0 when reloading the ICU driver. I think >>>>>> this has a bug. But we should rewrite this driver to be a NIF anyway. >>>>>> >>>>>> * New couch OS process manager. Significantly faster access to OS >>>>>> processes under heavy load. This replaces the hard limit with a soft >>>>>> limit. Process spawned over the soft limit will be used until they’ve >>>>>> sat idle for a few minutes and then be closed. We have a todo item to >>>>>> add the hard ceiling back in (while keeping the soft ceiling). >>>>>> >>>>>> * Automatically replace some easily identifiable JS reductions with >>>>>> their builtin counterparts. Uses a regex to do the detection so its >>>>>> not too smart. >>>>>> >>>>>> * Improved view updater write batch. >>>>>> >>>>>> * Updates to couchjs’ views.js to improve index update speeds >>>>>> >>>>>> * Updates to the _stats bultin reduce to allow reduces to work over >>>>>> emitted stats objects. Sometimes clients have summary data in a doc, >>>>>> and this allows them to combine stats if they follow the same pattern >>>>>> as the builtin expects. >>>>>> >>>>>> * Added a config:reload() that is accessible by POST’ing to >>>>>> _config/_reload. Used by the JS tests to reset the config to what's on >>>>>> disk. This should prevent those test run failures where a test fails >>>>>> leaving the config in a bad state causing all subsequent tests to >>>>>> fail. I think. Maybe. >>>>>> >>>>>> * Databases are deleted synchronously in the test suite. We may need >>>>>> to address this on Windows. But it does seem to reduce the number of >>>>>> “{error, file_exists}” failures. >>>>>> >>>>>> * I reimplemented the JS restartServer() function. There’s a new >>>>>> _restart/token URL that will given a unique value for each instance of >>>>>> the Erlang VM. To run a restart we grab the current token value, hit >>>>>> _restart, then wait till we get a successful response with a different >>>>>> token. This appears to have made the restart strategy more robust. >>>>>> >>>>>> >>>>>> >>>>>> Things that need doing >>>>>> >>>>>> >>>>>> IP Clearance - >>>>>> >>>>>> >>>>>> We’ll need to track down if we have the CCLA as well as look at each >>>>>> source file added to make sure each one is strictly from Cloudant or >>>>>> has an amenable license. I’m pretty sure that the only one of interest >>>>>> is trunc_io.erl but we need to be thorough. >>>>>> >>>>>> documentation - >>>>>> >>>>>> >>>>>> There shouldn’t be much here since the entire point of this merge was >>>>>> to not change the visible behavior of single node couch. A few things >>>>>> to add about the testing endpoints. Maybe an update to the compaction >>>>>> section mention the two new file names used. >>>>>> >>>>>> >>>>>> Copyright notices - >>>>>> >>>>>> >>>>>> We need to strip out copyright notices from individual files and make >>>>>> sure all files have a standard Apache License v2 header. >>>>>> >>>>>> >>>>>> clustered vhosts - >>>>>> >>>>>> >>>>>> We’ve never implemented this at Cloudant. We either need to write a >>>>>> cluster or go back and tell people to use HAProxy (or similar) for >>>>>> such things. >>>>>> >>>>>> >>>>>> twig - >>>>>> >>>>>> >>>>>> We need to add another output type to twig that is configurable in >>>>>> some manner. Right now we spit out entire rsyslog records which isn’t >>>>>> useful for most people. We’ll need to implement the file writer from >>>>>> couch_log as well as update the _log HTTP handler to know when it can >>>>>> and can’t expect to find data on disk. >>>>>> >>>>>> >>>>>> fabric - >>>>>> >>>>>> >>>>>> This is going to need a lot of work. Specifically view access is going >>>>>> to need to be updated to work with couch_mrview and friends. >>>>>> >>>>>> >>>>>> Boot a dev cluster - >>>>>> >>>>>> >>>>>> Once we fix up the clustering code we’ll need to write instructions >>>>>> and scripts for pulling up a dev cluster. >>>>>> >>>>>> >>>>>> OTP stuff - >>>>>> >>>>>> >>>>>> We’ve updated each app but we still need to pull some parts out of >>>>>> couchdb into their own application. Specifically the HTTP layer needs >>>>>> its own app. We could probably pull out the os process/query_servers >>>>>> as well as the os daemons and friends. Once done we need to update the >>>>>> supervision trees so we don’t have things like couch starting and >>>>>> managing the replication manager process. >>>>>> >>>>>> >>>>>> ddoc_cache - >>>>>> >>>>>> >>>>>> Wire this up in couch_httpd_db to actually be used. Right now its only >>>>>> used in chttpd. >>>>>> >>>>>> >>>>>> couch_file upgrade - >>>>>> >>>>>> >>>>>> The revert to remove the second updater_fd from each #db{} record >>>>>> means that we’re back in the original position of files appearing to >>>>>> slow down significantly under load. Since the initial hammer approach >>>>>> of just adding a second fd we’ve since discovered that the underlying >>>>>> bug is due to the way that message passing works combined with >>>>>> Erlang’s file io. Significantly though is the fact that the fix is >>>>>> rather simple to implement. A first draft of this work is on an old >>>>>> branch of mine here: >>>>>> >>>>>> >>>>>> https://github.com/davisp/couchdb/commit/d856878 >>>>>> >>>>>> >>>>>> finish the size calculating changes - >>>>>> >>>>>> >>>>>> The #leaf{} record change is to enable us to add more data size >>>>>> calculations. CouchDB master calculates a data size that account for >>>>>> all bytes that are active in a .couch file. Cloudant is interested in >>>>>> the total size of uncompressed docs and attachments minus the internal >>>>>> overhead of btrees. And there’s a fourth number to calculate based on >>>>>> the compression level used. Having each of these numbers will be >>>>>> useful as well as the calculations they’ll enable (ie, dead bytes in >>>>>> file, bytes used for overhead, compression ratio achieved, etc). >>>>>> >>>>>> >>>>>> couch_proc_manager - >>>>>> >>>>>> >>>>>> We need to implement the hard ceiling for capping the number of OS >>>>>> processes. We’ve started seeing a need for this at Cloudant with some >>>>>> work loads so motivation to fix this is high. The only failing etap is >>>>>> the assertion of this ceiling. >>>>>> >>>>>> >>>>>> Synchronous db delete on Windows - >>>>>> >>>>>> >>>>>> I did this because running the test suite was driving me bonkers. I >>>>>> need to ask Dave about how this behaves on Windows (my guess is not >>>>>> well) but I think we can close things up so that it works better than >>>>>> the status quo. >>>>> >> >> -- >> Joan Touzet | jo...@atypical.net | wohali everywhere else >