Re: Attachment level replication

2009-07-14 Thread Ben Browning
From what you describe, it sounds like an excellent idea.

Do you have a clear idea of what the new APIs would be and if they're
potentially useful outside of the new replicator?

Also, I don't think it was explicitly mentioned, but is this slated to
make it into a 0.10.x release?

Ben


[jira] Created: (COUCHDB-400) Allow Distributed Erlang Short/Long Name and Cookie to be Specified in Configuration Files

2009-07-04 Thread Ben Browning (JIRA)
Allow Distributed Erlang Short/Long Name and Cookie to be Specified in 
Configuration Files
--

 Key: COUCHDB-400
 URL: https://issues.apache.org/jira/browse/COUCHDB-400
 Project: CouchDB
  Issue Type: Improvement
  Components: Infrastructure
Reporter: Ben Browning
Priority: Minor


It would be nice to allow specifying Erlang short or long name and cookie via 
the ini files and have CouchDB read those values and setup distributed Erlang 
appropriately. This is useful for anyone using Hovercraft as well as a 
potential couchctl script to interact with a running CouchDB node via the 
command-line.

An alternative to specifying these in the .ini files would be to allow doing it 
via startup scripts, but the ini files feels like a cleaner solution to me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-400) Allow Distributed Erlang Short/Long Name and Cookie to be Specified in Configuration Files

2009-07-04 Thread Ben Browning (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Browning updated COUCHDB-400:
-

Attachment: couchdb_400.patch

First stab at setting up distributed erlang via ini files

 Allow Distributed Erlang Short/Long Name and Cookie to be Specified in 
 Configuration Files
 --

 Key: COUCHDB-400
 URL: https://issues.apache.org/jira/browse/COUCHDB-400
 Project: CouchDB
  Issue Type: Improvement
  Components: Infrastructure
Reporter: Ben Browning
Priority: Minor
 Attachments: couchdb_400.patch


 It would be nice to allow specifying Erlang short or long name and cookie via 
 the ini files and have CouchDB read those values and setup distributed Erlang 
 appropriately. This is useful for anyone using Hovercraft as well as a 
 potential couchctl script to interact with a running CouchDB node via the 
 command-line.
 An alternative to specifying these in the .ini files would be to allow doing 
 it via startup scripts, but the ini files feels like a cleaner solution to me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (COUCHDB-366) Error Uploading Attachment

2009-05-28 Thread Ben Browning (JIRA)
Error Uploading Attachment
--

 Key: COUCHDB-366
 URL: https://issues.apache.org/jira/browse/COUCHDB-366
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.10
 Environment: Ubuntu 8.04, CouchDB Trunk rev 779803
Reporter: Ben Browning


Traceback attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-366) Error Uploading Attachment

2009-05-28 Thread Ben Browning (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Browning updated COUCHDB-366:
-

Attachment: attachment_traceback.txt

 Error Uploading Attachment
 --

 Key: COUCHDB-366
 URL: https://issues.apache.org/jira/browse/COUCHDB-366
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.10
 Environment: Ubuntu 8.04, CouchDB Trunk rev 779803
Reporter: Ben Browning
 Attachments: attachment_traceback.txt


 Traceback attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-366) Error Uploading Attachment

2009-05-28 Thread Ben Browning (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Browning updated COUCHDB-366:
-

Attachment: bespin.zip


CouchApp that produces the error

 Error Uploading Attachment
 --

 Key: COUCHDB-366
 URL: https://issues.apache.org/jira/browse/COUCHDB-366
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.10
 Environment: Ubuntu 8.04, CouchDB Trunk rev 779803
Reporter: Ben Browning
 Attachments: attachment_traceback.txt, bespin.zip


 20:21 davisp damienkatz: uploading a large attachment ends up
throwing a function_clause error on split_iolist and
the parameter types are binary(), int(), [binary()]
 Traceback attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (COUCHDB-366) Error Uploading Attachment

2009-05-28 Thread Ben Browning (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Browning updated COUCHDB-366:
-

Attachment: couchdb-366-test.patch


Attached javascript test case to reproduce the error

 Error Uploading Attachment
 --

 Key: COUCHDB-366
 URL: https://issues.apache.org/jira/browse/COUCHDB-366
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 0.10
 Environment: Ubuntu 8.04, CouchDB Trunk rev 779803
Reporter: Ben Browning
 Attachments: attachment_traceback.txt, bespin.zip, 
 couchdb-366-test.patch


 20:21 davisp damienkatz: uploading a large attachment ends up
throwing a function_clause error on split_iolist and
the parameter types are binary(), int(), [binary()]
 Traceback attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Jinterface

2009-05-23 Thread Ben Browning
On Fri, May 22, 2009 at 11:46 PM, Evan ev...@michevan.id.au wrote:
 Anybody seen JInterface before?  http://erlang.org/doc/apps/jinterface/


I've used JInterface in a simple, custom java client library before
and was able to insert and retrieve docs substantially faster than the
HTTP client libraries. The speed comes at a cost though and for
typical apps it makes more sense to use the HTTP libraries. I've
started work on rewriting this to a more generic, reusable client
library but have some other higher priority projects at the moment.

 Thought it could prove useful if anyone was interested in making a Java
 based view server.


I think you could make a Java view server fairly easily without
JInterface since the view servers communicate with the rest of Couch
over stdin/stdout. However, it does open some interesting
possibilities for integrating Couch with other Java server
technologies for things like httpd handlers.

Ben


Re: [VOTE] Apache CouchDB 0.9.0 release

2009-03-27 Thread Ben Browning
+1 for 0.9 release - The current trunk is a huge step forward from the
last release.


Ben


Re: Lounge clustering framework

2009-03-26 Thread Ben Browning
This is great to hear. I'll set aside some time to check out the
source this weekend and play with things. Thanks for the contribution
to the community!

Ben


Re: Simple erlang plugins (was Re: couch_gen_btree: pluggable storage / tree engines)

2009-03-22 Thread Ben Browning
To spark some additional conversation around plugins, is there a clear
idea how we would handle plugin X that also uses plugin Y? For
example, I could see the Erlang API, the partitioning/clustering
functionality, and individual pieces of the clustering as additional
plugins (consistent hashing algorithm being the immediately obvious
one). The partitioning would depend on the Erlang API and a consistent
hashing plugin being configured.

For now I'm just throwing additional modules in the src/couchdb
directory in my git branch but there are enough modules in there it
makes me hesitate to add more. We could separate these into multiple
folders by using Erlang packages, but a plugin system could accomplish
the same goal and provide additional benefit for 3rd-party plugins.

It's not a pressing priority, but getting a plugin system in-place
would better allow for 3rd-party development around CouchDB and make
it easier to release and version functionality separately from the
core app.


Ben


Erlang API Discussion

2009-03-05 Thread Ben Browning
A prerequisite for partitioning and a general nice-to-have feature is
an Erlang API for CouchDB. Let's get the discussion going; I'd love to
see an initial version of the API included with the 0.10 release.

The API will overlap quite a bit with the code in couch_http_db.erl,
couch_http_view.erl, etc. I propose refactoring this code into the
Erlang API methods and having the couch_http_*.erl files use the
Erlang API. This helps us reduce code duplication and will allow us to
test the Erlang API with the existing HTTP tests.

For the initial version of the API, I'd prefer to only expose a subset
of the HTTP API. Document CRUD methods are an obvious one and at least
a basic form of view queries would be nice. The Erlang API needs to be
just as stable, if not more stable, than the HTTP API - I'm open to
implementing more or less pending reasonable discussion.

The parameters passed in and data returned from the API should be
well-defined. I propose any complex data structures given as input or
returned from the API be records defined in couch_db.hrl or another
appropriate place. There are already records defined here for
documents and view query arguments, which might be all we need.

This should be enough to spark an initial discussion. I'll create a
page on the wiki for the API proposal to consolidate some of the
information once other people give their input.

Thanks,

Ben


Re: Erlang API Discussion

2009-03-05 Thread Ben Browning
On Thu, Mar 5, 2009 at 7:29 PM, Chris Anderson jch...@apache.org wrote:
 I'm not sure having the HTTP wrapper use the API is the best plan, as
 it might turn out to be indirection for indirections sake. If it turns
 out to be simpler to use the Erlang API, then of course lets do, but
 if it is slower or more confusing, than we shouldn't feel like we have
 to.

I'm fine with this, with the caveat that I'd like to reduce code
duplication as much as practical. A good first step could be to write
the Erlang API and later modify the HTTP API to use it if it makes
sense.


 +1 to keeping it limited to document CRUD on the first round. View
 queries will be harder to model as they rely on sending side-effects
 out the HTTP socket (maybe replace http socket with gen-server reply?)

I've been trying to figure out for the last few days exactly how view
queries would work in the Erlang API. If we don't have the HTTP API
rely on the Erlang API this becomes much easier and I'm sure we could
come up with a good solution.


 Internally, all database operations are updates, and they are all bulk
 (sometimes with bulk size of 1). I'm not sure how much we want to hide
 this from the user. It might be better to keep the API wrapper thin,
 so the user sees this. Then we can potentially add wrappers so you
 don't have to remember what the structure of a delete is, for
 instance.

I like the idea of growing the API as needed. It sounds like you feel
we could get by with an initial API that just allows you to retrieve a
document and update a document. Is that accurate?


[jira] Updated: (COUCHDB-216) Make couchdb adhere more to OTP guidelines

2009-03-04 Thread Ben Browning (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Browning updated COUCHDB-216:
-

Attachment: 0001-Generate-the-modules-section-of-couch.app-via-the-Ma.patch


The modules section of couch.app was outdated. Instead of updating it manually, 
I've modified the Makefile to populate this based on the modules actually 
present when building. This should keep it from getting out of date in the 
future.

 Make couchdb adhere more to OTP guidelines
 --

 Key: COUCHDB-216
 URL: https://issues.apache.org/jira/browse/COUCHDB-216
 Project: CouchDB
  Issue Type: Improvement
Reporter: Martin S
 Attachments: 0001-add-couch_app-and-couch_sup.patch, 
 0001-Generate-the-modules-section-of-couch.app-via-the-Ma.patch, 
 0002-add-missing-registered-process-names.patch, 
 0003-make-couchdb-startup-script-use-couch.rel.-load-conf.patch


 CouchDB could adhere to otp standards in a better way. 
 Currently we have:
 - couch.app is not uptodate
 - couch_server.erl is an amalgam of 2 behaviours, which is considered being 
 not good.
 From my beginner's perception of OTP, it seems CouchDB is not treated as a 
 running application when it is actually running. E.g. appmon doesn't show the 
 application once CouchDB is started. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Partitioning use cases

2009-03-01 Thread Ben Browning
David,


On Fri, Feb 27, 2009 at 7:16 PM, David Van Couvering
da...@vancouvering.com wrote:
 Hi, all.  I did a first pass at the high-level use cases, both to start with
 and then looking forward.

 http://wiki.apache.org/couchdb/Partitioning_proposal

Thanks for adding to the document :)


 I'm sure I'm missing some things (in particular, I don't fully understand
 the discussion about configuring the topology differently depending upon
 performance needs - I wonder if that's an implementation-specific detail).

From what I understand, the initial idea was to have the partitioning
be fairly static and not try to tackle all the dynamic challenges
up-front. So initially a topology would be decided upon based upon the
needs of a particular dataset (data-intensive, map/reduce intensive,
etc) and remain fairly stable.


 One way perhaps I could help is to work on an API that maps a key to a node
 (with some sort of pluggable interface for various consistent hashing
 algorithms, starting maybe with Chord, given that there is an existing
 implementation in Erlang called Chordial -
 http://dawsdesign.com/drupal/chordial)

I think having a pluggable interface for choosing the consistent
hashing algorithm is a good idea. Chord looks very nice for systems
where nodes are dynamically added and removed all the time. I think
that's a bit more advanced than the initial implementation is shooting
for but maps very well to the long-term goal with CouchDB as a true
distributed database.

These are my thoughts, so if they don't map with the opinions of the
rest of the community please speak up.

Ben


Re: Stats

2009-02-22 Thread Ben Browning
It happens to the best of us - thanks for fixing it so quickly.


On Sun, Feb 22, 2009 at 10:37 AM, Damien Katz dam...@apache.org wrote:
 Heh, I've done that too.

 -Damien


 On Feb 22, 2009, at 10:31 AM, Jan Lehnardt wrote:

 Cheers Ben,

 indeed, the files didn't make it into the commit.

 They are now in as of r746734.

 Thanks again.
 Jan


Re: Partitioned Clusters

2009-02-21 Thread Ben Browning
On Fri, Feb 20, 2009 at 7:34 PM, Chris Anderson jch...@apache.org wrote:
 I think so. I think that there could be proxy overlap / redundancy
 across all levels of the tree, and also in the case of a flat tree.

 As long as the proxies agree on how to hash from URLs to nodes it
 should just work.

I've been thinking about how to address the issue of allowing
different configurations for different needs. I think if all we do is
tell a proxy node who its children are, how to map IDs to those
children, and allow a proxy to also be a node, we can handle almost
any configuration.

Examples:
* All Peers - 2 nodes in the system, A  B. A is configured so odd IDs
map to A, even IDs map to B. B is configured with the same ID ranges.
You can load-balance across nodes A  B and take advantage of
increased write throughput. This is probably the simplest clustering
scenario for people that don't have enough traffic to fully utilize a
standalone proxy node.

* 1 or more proxies, multiple nodes - The proxies are all configured
identically to map document IDs among nodes A-J. Nodes A-J know
nothing about each other or their parents. In this scenario you can
add very easily add proxy nodes as needed to handle the increased load
when aggregating results from more nodes.

* Tree structure - The top-level proxies are configured to map
document IDs to nodes. These nodes may in fact be other proxies which
are then configured to map to their nodes. Except for multiple levels
of proxies, this is the same as the above scenario.

Does it sound reasonable to expect a proxy to be aware of its children
but not vice-versa? In an actual implementation I see the list of
children and their mappings being stored in a document so that it
could be updated while running to add/remove children.

Adding a child in this scenario would involve choosing an ID range,
replicating the relevant data from the other children, and updating
this mapping. This would depend on partial replication to replicate
only the data needed for the new child. I don't see this as something
that's too complex - the only issue I see is you'll probably need to
replicate data at least twice, once before the proxy mapping is
updated and once after to get any final data that was written to the
other children since the first replication. This also assumes you've
chosen a consistent hashing algorithm so that the data on all nodes
doesn't have to change when adding a single new node.

Removing a child node would be the opposite process. I could foresee
us coming up with a tool to automate most if not all of this process,
possibly only requiring the user to start the new CouchDB server, fill
in some values in Futon for ID mappings, and press a button.


Sound reasonable?

- Ben


Re: Partitioned Clusters

2009-02-19 Thread Ben Browning
Overall the model sounds very similar to what I was thinking. I just
have a few comments.

 In this model documents are saved to a leaf node depending on a hash
 of the docid. This means that lookups are easy, and need only to touch
 the leaf node which holds the doc. Redundancy can be provided by
 maintaining R replicas of every leaf node.

There are several use-cases where a true hash of the docid won't be the
optimal partitioning key. The simple case is where you want to partition
your data by user and in most non-trivial cases you won't be storing
all of a user's data under one document with the user's id as the docid.
A fairly simple solution would be allowing the developer to specify a javascript
function somewhere (not sure where this should live...) that takes a docid and
spits out a partition key. Then I could just prefix all my doc ids for
a specific user
with that user's id and write the appropriate partition function.


 View queries, on the other hand, must be handled by every node. The
 requests are proxied down the tree to leaf nodes, which respond
 normally. Each proxy node then runs a merge sort algorithm (which can
 sort in constant space proportional to # of input streams) on the view
 results. This can happen recursively if the tree is deep.

If the developer has control over partition keys as suggested above, it's
entirely possible to have applications where view queries only need data
from one partition. It would be great if we could do something smart here or
have a way for the developer to indicate to Couch that all the data should
be on only one partition.

These are just nice-to-have features and the described cluster setup could
still be extremely useful without them.

The tree setup sounds interesting but I wonder how it would compare in
latency to a flat setup with the same number of leaf nodes. As long as the
developer can control the tree structure (# of children per parent) then this
concern shouldn't be an issue.

- Ben


Re: Couch clustering/partitioning Re: CouchSpray - Thoughts?

2009-02-19 Thread Ben Browning
I'd be very interested in any code that you could release. If that's
not possible then
your experiences and challenges faced while implementing this cluster would
definitely be a great help to the community.

It's encouraging to see someone with a working cluster already running.

Ben