Re: Attachment level replication
From what you describe, it sounds like an excellent idea. Do you have a clear idea of what the new APIs would be and if they're potentially useful outside of the new replicator? Also, I don't think it was explicitly mentioned, but is this slated to make it into a 0.10.x release? Ben
[jira] Created: (COUCHDB-400) Allow Distributed Erlang Short/Long Name and Cookie to be Specified in Configuration Files
Allow Distributed Erlang Short/Long Name and Cookie to be Specified in Configuration Files -- Key: COUCHDB-400 URL: https://issues.apache.org/jira/browse/COUCHDB-400 Project: CouchDB Issue Type: Improvement Components: Infrastructure Reporter: Ben Browning Priority: Minor It would be nice to allow specifying Erlang short or long name and cookie via the ini files and have CouchDB read those values and setup distributed Erlang appropriately. This is useful for anyone using Hovercraft as well as a potential couchctl script to interact with a running CouchDB node via the command-line. An alternative to specifying these in the .ini files would be to allow doing it via startup scripts, but the ini files feels like a cleaner solution to me. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-400) Allow Distributed Erlang Short/Long Name and Cookie to be Specified in Configuration Files
[ https://issues.apache.org/jira/browse/COUCHDB-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Browning updated COUCHDB-400: - Attachment: couchdb_400.patch First stab at setting up distributed erlang via ini files Allow Distributed Erlang Short/Long Name and Cookie to be Specified in Configuration Files -- Key: COUCHDB-400 URL: https://issues.apache.org/jira/browse/COUCHDB-400 Project: CouchDB Issue Type: Improvement Components: Infrastructure Reporter: Ben Browning Priority: Minor Attachments: couchdb_400.patch It would be nice to allow specifying Erlang short or long name and cookie via the ini files and have CouchDB read those values and setup distributed Erlang appropriately. This is useful for anyone using Hovercraft as well as a potential couchctl script to interact with a running CouchDB node via the command-line. An alternative to specifying these in the .ini files would be to allow doing it via startup scripts, but the ini files feels like a cleaner solution to me. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (COUCHDB-366) Error Uploading Attachment
Error Uploading Attachment -- Key: COUCHDB-366 URL: https://issues.apache.org/jira/browse/COUCHDB-366 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.10 Environment: Ubuntu 8.04, CouchDB Trunk rev 779803 Reporter: Ben Browning Traceback attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-366) Error Uploading Attachment
[ https://issues.apache.org/jira/browse/COUCHDB-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Browning updated COUCHDB-366: - Attachment: attachment_traceback.txt Error Uploading Attachment -- Key: COUCHDB-366 URL: https://issues.apache.org/jira/browse/COUCHDB-366 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.10 Environment: Ubuntu 8.04, CouchDB Trunk rev 779803 Reporter: Ben Browning Attachments: attachment_traceback.txt Traceback attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-366) Error Uploading Attachment
[ https://issues.apache.org/jira/browse/COUCHDB-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Browning updated COUCHDB-366: - Attachment: bespin.zip CouchApp that produces the error Error Uploading Attachment -- Key: COUCHDB-366 URL: https://issues.apache.org/jira/browse/COUCHDB-366 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.10 Environment: Ubuntu 8.04, CouchDB Trunk rev 779803 Reporter: Ben Browning Attachments: attachment_traceback.txt, bespin.zip 20:21 davisp damienkatz: uploading a large attachment ends up throwing a function_clause error on split_iolist and the parameter types are binary(), int(), [binary()] Traceback attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (COUCHDB-366) Error Uploading Attachment
[ https://issues.apache.org/jira/browse/COUCHDB-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Browning updated COUCHDB-366: - Attachment: couchdb-366-test.patch Attached javascript test case to reproduce the error Error Uploading Attachment -- Key: COUCHDB-366 URL: https://issues.apache.org/jira/browse/COUCHDB-366 Project: CouchDB Issue Type: Bug Components: Database Core Affects Versions: 0.10 Environment: Ubuntu 8.04, CouchDB Trunk rev 779803 Reporter: Ben Browning Attachments: attachment_traceback.txt, bespin.zip, couchdb-366-test.patch 20:21 davisp damienkatz: uploading a large attachment ends up throwing a function_clause error on split_iolist and the parameter types are binary(), int(), [binary()] Traceback attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Jinterface
On Fri, May 22, 2009 at 11:46 PM, Evan ev...@michevan.id.au wrote: Anybody seen JInterface before? http://erlang.org/doc/apps/jinterface/ I've used JInterface in a simple, custom java client library before and was able to insert and retrieve docs substantially faster than the HTTP client libraries. The speed comes at a cost though and for typical apps it makes more sense to use the HTTP libraries. I've started work on rewriting this to a more generic, reusable client library but have some other higher priority projects at the moment. Thought it could prove useful if anyone was interested in making a Java based view server. I think you could make a Java view server fairly easily without JInterface since the view servers communicate with the rest of Couch over stdin/stdout. However, it does open some interesting possibilities for integrating Couch with other Java server technologies for things like httpd handlers. Ben
Re: [VOTE] Apache CouchDB 0.9.0 release
+1 for 0.9 release - The current trunk is a huge step forward from the last release. Ben
Re: Lounge clustering framework
This is great to hear. I'll set aside some time to check out the source this weekend and play with things. Thanks for the contribution to the community! Ben
Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)
To spark some additional conversation around plugins, is there a clear idea how we would handle plugin X that also uses plugin Y? For example, I could see the Erlang API, the partitioning/clustering functionality, and individual pieces of the clustering as additional plugins (consistent hashing algorithm being the immediately obvious one). The partitioning would depend on the Erlang API and a consistent hashing plugin being configured. For now I'm just throwing additional modules in the src/couchdb directory in my git branch but there are enough modules in there it makes me hesitate to add more. We could separate these into multiple folders by using Erlang packages, but a plugin system could accomplish the same goal and provide additional benefit for 3rd-party plugins. It's not a pressing priority, but getting a plugin system in-place would better allow for 3rd-party development around CouchDB and make it easier to release and version functionality separately from the core app. Ben
Erlang API Discussion
A prerequisite for partitioning and a general nice-to-have feature is an Erlang API for CouchDB. Let's get the discussion going; I'd love to see an initial version of the API included with the 0.10 release. The API will overlap quite a bit with the code in couch_http_db.erl, couch_http_view.erl, etc. I propose refactoring this code into the Erlang API methods and having the couch_http_*.erl files use the Erlang API. This helps us reduce code duplication and will allow us to test the Erlang API with the existing HTTP tests. For the initial version of the API, I'd prefer to only expose a subset of the HTTP API. Document CRUD methods are an obvious one and at least a basic form of view queries would be nice. The Erlang API needs to be just as stable, if not more stable, than the HTTP API - I'm open to implementing more or less pending reasonable discussion. The parameters passed in and data returned from the API should be well-defined. I propose any complex data structures given as input or returned from the API be records defined in couch_db.hrl or another appropriate place. There are already records defined here for documents and view query arguments, which might be all we need. This should be enough to spark an initial discussion. I'll create a page on the wiki for the API proposal to consolidate some of the information once other people give their input. Thanks, Ben
Re: Erlang API Discussion
On Thu, Mar 5, 2009 at 7:29 PM, Chris Anderson jch...@apache.org wrote: I'm not sure having the HTTP wrapper use the API is the best plan, as it might turn out to be indirection for indirections sake. If it turns out to be simpler to use the Erlang API, then of course lets do, but if it is slower or more confusing, than we shouldn't feel like we have to. I'm fine with this, with the caveat that I'd like to reduce code duplication as much as practical. A good first step could be to write the Erlang API and later modify the HTTP API to use it if it makes sense. +1 to keeping it limited to document CRUD on the first round. View queries will be harder to model as they rely on sending side-effects out the HTTP socket (maybe replace http socket with gen-server reply?) I've been trying to figure out for the last few days exactly how view queries would work in the Erlang API. If we don't have the HTTP API rely on the Erlang API this becomes much easier and I'm sure we could come up with a good solution. Internally, all database operations are updates, and they are all bulk (sometimes with bulk size of 1). I'm not sure how much we want to hide this from the user. It might be better to keep the API wrapper thin, so the user sees this. Then we can potentially add wrappers so you don't have to remember what the structure of a delete is, for instance. I like the idea of growing the API as needed. It sounds like you feel we could get by with an initial API that just allows you to retrieve a document and update a document. Is that accurate?
[jira] Updated: (COUCHDB-216) Make couchdb adhere more to OTP guidelines
[ https://issues.apache.org/jira/browse/COUCHDB-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Browning updated COUCHDB-216: - Attachment: 0001-Generate-the-modules-section-of-couch.app-via-the-Ma.patch The modules section of couch.app was outdated. Instead of updating it manually, I've modified the Makefile to populate this based on the modules actually present when building. This should keep it from getting out of date in the future. Make couchdb adhere more to OTP guidelines -- Key: COUCHDB-216 URL: https://issues.apache.org/jira/browse/COUCHDB-216 Project: CouchDB Issue Type: Improvement Reporter: Martin S Attachments: 0001-add-couch_app-and-couch_sup.patch, 0001-Generate-the-modules-section-of-couch.app-via-the-Ma.patch, 0002-add-missing-registered-process-names.patch, 0003-make-couchdb-startup-script-use-couch.rel.-load-conf.patch CouchDB could adhere to otp standards in a better way. Currently we have: - couch.app is not uptodate - couch_server.erl is an amalgam of 2 behaviours, which is considered being not good. From my beginner's perception of OTP, it seems CouchDB is not treated as a running application when it is actually running. E.g. appmon doesn't show the application once CouchDB is started. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Partitioning use cases
David, On Fri, Feb 27, 2009 at 7:16 PM, David Van Couvering da...@vancouvering.com wrote: Hi, all. I did a first pass at the high-level use cases, both to start with and then looking forward. http://wiki.apache.org/couchdb/Partitioning_proposal Thanks for adding to the document :) I'm sure I'm missing some things (in particular, I don't fully understand the discussion about configuring the topology differently depending upon performance needs - I wonder if that's an implementation-specific detail). From what I understand, the initial idea was to have the partitioning be fairly static and not try to tackle all the dynamic challenges up-front. So initially a topology would be decided upon based upon the needs of a particular dataset (data-intensive, map/reduce intensive, etc) and remain fairly stable. One way perhaps I could help is to work on an API that maps a key to a node (with some sort of pluggable interface for various consistent hashing algorithms, starting maybe with Chord, given that there is an existing implementation in Erlang called Chordial - http://dawsdesign.com/drupal/chordial) I think having a pluggable interface for choosing the consistent hashing algorithm is a good idea. Chord looks very nice for systems where nodes are dynamically added and removed all the time. I think that's a bit more advanced than the initial implementation is shooting for but maps very well to the long-term goal with CouchDB as a true distributed database. These are my thoughts, so if they don't map with the opinions of the rest of the community please speak up. Ben
Re: Stats
It happens to the best of us - thanks for fixing it so quickly. On Sun, Feb 22, 2009 at 10:37 AM, Damien Katz dam...@apache.org wrote: Heh, I've done that too. -Damien On Feb 22, 2009, at 10:31 AM, Jan Lehnardt wrote: Cheers Ben, indeed, the files didn't make it into the commit. They are now in as of r746734. Thanks again. Jan
Re: Partitioned Clusters
On Fri, Feb 20, 2009 at 7:34 PM, Chris Anderson jch...@apache.org wrote: I think so. I think that there could be proxy overlap / redundancy across all levels of the tree, and also in the case of a flat tree. As long as the proxies agree on how to hash from URLs to nodes it should just work. I've been thinking about how to address the issue of allowing different configurations for different needs. I think if all we do is tell a proxy node who its children are, how to map IDs to those children, and allow a proxy to also be a node, we can handle almost any configuration. Examples: * All Peers - 2 nodes in the system, A B. A is configured so odd IDs map to A, even IDs map to B. B is configured with the same ID ranges. You can load-balance across nodes A B and take advantage of increased write throughput. This is probably the simplest clustering scenario for people that don't have enough traffic to fully utilize a standalone proxy node. * 1 or more proxies, multiple nodes - The proxies are all configured identically to map document IDs among nodes A-J. Nodes A-J know nothing about each other or their parents. In this scenario you can add very easily add proxy nodes as needed to handle the increased load when aggregating results from more nodes. * Tree structure - The top-level proxies are configured to map document IDs to nodes. These nodes may in fact be other proxies which are then configured to map to their nodes. Except for multiple levels of proxies, this is the same as the above scenario. Does it sound reasonable to expect a proxy to be aware of its children but not vice-versa? In an actual implementation I see the list of children and their mappings being stored in a document so that it could be updated while running to add/remove children. Adding a child in this scenario would involve choosing an ID range, replicating the relevant data from the other children, and updating this mapping. This would depend on partial replication to replicate only the data needed for the new child. I don't see this as something that's too complex - the only issue I see is you'll probably need to replicate data at least twice, once before the proxy mapping is updated and once after to get any final data that was written to the other children since the first replication. This also assumes you've chosen a consistent hashing algorithm so that the data on all nodes doesn't have to change when adding a single new node. Removing a child node would be the opposite process. I could foresee us coming up with a tool to automate most if not all of this process, possibly only requiring the user to start the new CouchDB server, fill in some values in Futon for ID mappings, and press a button. Sound reasonable? - Ben
Re: Partitioned Clusters
Overall the model sounds very similar to what I was thinking. I just have a few comments. In this model documents are saved to a leaf node depending on a hash of the docid. This means that lookups are easy, and need only to touch the leaf node which holds the doc. Redundancy can be provided by maintaining R replicas of every leaf node. There are several use-cases where a true hash of the docid won't be the optimal partitioning key. The simple case is where you want to partition your data by user and in most non-trivial cases you won't be storing all of a user's data under one document with the user's id as the docid. A fairly simple solution would be allowing the developer to specify a javascript function somewhere (not sure where this should live...) that takes a docid and spits out a partition key. Then I could just prefix all my doc ids for a specific user with that user's id and write the appropriate partition function. View queries, on the other hand, must be handled by every node. The requests are proxied down the tree to leaf nodes, which respond normally. Each proxy node then runs a merge sort algorithm (which can sort in constant space proportional to # of input streams) on the view results. This can happen recursively if the tree is deep. If the developer has control over partition keys as suggested above, it's entirely possible to have applications where view queries only need data from one partition. It would be great if we could do something smart here or have a way for the developer to indicate to Couch that all the data should be on only one partition. These are just nice-to-have features and the described cluster setup could still be extremely useful without them. The tree setup sounds interesting but I wonder how it would compare in latency to a flat setup with the same number of leaf nodes. As long as the developer can control the tree structure (# of children per parent) then this concern shouldn't be an issue. - Ben
Re: Couch clustering/partitioning Re: CouchSpray - Thoughts?
I'd be very interested in any code that you could release. If that's not possible then your experiences and challenges faced while implementing this cluster would definitely be a great help to the community. It's encouraging to see someone with a working cluster already running. Ben