Re: Enrolment open: Introduction to CouchDB Development

2011-12-14 Thread CGS

Hi Joan,

I am interested.

Best regards,
CGS



On 12/14/2011 11:13 PM, Joan Touzet wrote:

Enrolment for the "Introduction to CouchDB Development" course is now
open at http://moodle.wohmart.com/ !

As previously described, this course will get you up to speed in Erlang
(40%), the fundamentals of how CouchDB is implemented (40%), and will
culminate in a small group project (20%).

There will be guest presentations from these illustrious contributors:
   * Randall Leeds
   * Bob Dionne
   * Adam Kocoloski
   * Dale Harvey
   * Paul Davis
   * Benoit Chesneau
   * Robert Newson
   * Jan Lehnardt
   * ...with more to come!

The course runs from 2012.1.9 - 2012.3.20 or so, with a commitment of
about 4-8 hours per week recommended. Also, be aware that this is an
online studio course, meaning it's run entirely online and in the open.
It's 100% free (as in beer and freedom).

Prerequisites:
   Strong knowledge of at least one programming language, preferably
 non-scripted [*]
   Know how to use CouchDB. This is not a CouchDB course.

To enrol:
   1. Reply to me privately by email, or via IRC (freenode's
  #couchdb, wohali), or via Twitter (@wohali).
   2. I will give you the enrolment key.
   3. Visit http://moodle.wohmart.com/ , select the course and register
  a new user using the provided enrolment key.

These people get priority enrolment until 12-16 since they expressed
early interest:
   Timothy Chen
   Roman Geber
   Matt Adams
   Bryan Green
   Sean Copenhaver
   Pete Vander Giessen
   Clifford Hung
   Dave Cottlehuber

See you there,
Joan

[*] If your only programming language is JavaScript, prepare to devote
more time to the course for the first 4 weeks. Moving to Erlang from
JavaScript will take more effort than if you have a background in at
least one non-scripting language.




Re: Enrolment open: Introduction to CouchDB Development

2011-12-14 Thread Joan Touzet
Please reply off list, folksI appreciate the very positive response
but I don't want to "pollute" the dev list with this.

Cheers,
Joan

On Wed, Dec 14, 2011 at 08:16:08PM -0600, Nathan Stott wrote:
> I'd like to get involved.  I use CouchDB all the time and even own the npm
> 'couchdb' repository but I have never done any actual coding on the CouchDB
> core itself.
> 
> On Wed, Dec 14, 2011 at 4:13 PM, Joan Touzet  wrote:
> 
> > Enrolment for the "Introduction to CouchDB Development" course is now
> > open at http://moodle.wohmart.com/ !
> >
> > As previously described, this course will get you up to speed in Erlang
> > (40%), the fundamentals of how CouchDB is implemented (40%), and will
> > culminate in a small group project (20%).
> >
> > There will be guest presentations from these illustrious contributors:
> >  * Randall Leeds
> >  * Bob Dionne
> >  * Adam Kocoloski
> >  * Dale Harvey
> >  * Paul Davis
> >  * Benoit Chesneau
> >  * Robert Newson
> >  * Jan Lehnardt
> >  * ...with more to come!
> >
> > The course runs from 2012.1.9 - 2012.3.20 or so, with a commitment of
> > about 4-8 hours per week recommended. Also, be aware that this is an
> > online studio course, meaning it's run entirely online and in the open.
> > It's 100% free (as in beer and freedom).
> >
> > Prerequisites:
> >  Strong knowledge of at least one programming language, preferably
> >non-scripted [*]
> >  Know how to use CouchDB. This is not a CouchDB course.
> >
> > To enrol:
> >  1. Reply to me privately by email, or via IRC (freenode's
> > #couchdb, wohali), or via Twitter (@wohali).
> >  2. I will give you the enrolment key.
> >  3. Visit http://moodle.wohmart.com/ , select the course and register
> > a new user using the provided enrolment key.
> >
> > These people get priority enrolment until 12-16 since they expressed
> > early interest:
> >  Timothy Chen
> >  Roman Geber
> >  Matt Adams
> >  Bryan Green
> >  Sean Copenhaver
> >  Pete Vander Giessen
> >  Clifford Hung
> >  Dave Cottlehuber
> >
> > See you there,
> > Joan
> >
> > [*] If your only programming language is JavaScript, prepare to devote
> > more time to the course for the first 4 weeks. Moving to Erlang from
> > JavaScript will take more effort than if you have a background in at
> > least one non-scripting language.
> >


Re: Enrolment open: Introduction to CouchDB Development

2011-12-14 Thread Robert French
I would definitely like to participate in this.

On Wed, Dec 14, 2011 at 8:16 PM, Nathan Stott  wrote:

> I'd like to get involved.  I use CouchDB all the time and even own the npm
> 'couchdb' repository but I have never done any actual coding on the CouchDB
> core itself.
>
> On Wed, Dec 14, 2011 at 4:13 PM, Joan Touzet  wrote:
>
> > Enrolment for the "Introduction to CouchDB Development" course is now
> > open at http://moodle.wohmart.com/ !
> >
> > As previously described, this course will get you up to speed in Erlang
> > (40%), the fundamentals of how CouchDB is implemented (40%), and will
> > culminate in a small group project (20%).
> >
> > There will be guest presentations from these illustrious contributors:
> >  * Randall Leeds
> >  * Bob Dionne
> >  * Adam Kocoloski
> >  * Dale Harvey
> >  * Paul Davis
> >  * Benoit Chesneau
> >  * Robert Newson
> >  * Jan Lehnardt
> >  * ...with more to come!
> >
> > The course runs from 2012.1.9 - 2012.3.20 or so, with a commitment of
> > about 4-8 hours per week recommended. Also, be aware that this is an
> > online studio course, meaning it's run entirely online and in the open.
> > It's 100% free (as in beer and freedom).
> >
> > Prerequisites:
> >  Strong knowledge of at least one programming language, preferably
> >non-scripted [*]
> >  Know how to use CouchDB. This is not a CouchDB course.
> >
> > To enrol:
> >  1. Reply to me privately by email, or via IRC (freenode's
> > #couchdb, wohali), or via Twitter (@wohali).
> >  2. I will give you the enrolment key.
> >  3. Visit http://moodle.wohmart.com/ , select the course and register
> > a new user using the provided enrolment key.
> >
> > These people get priority enrolment until 12-16 since they expressed
> > early interest:
> >  Timothy Chen
> >  Roman Geber
> >  Matt Adams
> >  Bryan Green
> >  Sean Copenhaver
> >  Pete Vander Giessen
> >  Clifford Hung
> >  Dave Cottlehuber
> >
> > See you there,
> > Joan
> >
> > [*] If your only programming language is JavaScript, prepare to devote
> > more time to the course for the first 4 weeks. Moving to Erlang from
> > JavaScript will take more effort than if you have a background in at
> > least one non-scripting language.
> >
>



-- 

Robert French
Departments of Mathematics and Computer Science
Austin Peay State University
roberto.fran...@gmail.com
(615) 829-6647


[jira] [Updated] (COUCHDB-1363) Race condition edge case when pulling local changes

2011-12-14 Thread Randall Leeds (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Leeds updated COUCHDB-1363:
---

Attachment: 0001-Fix-a-race-condition-starting-replications.patch

> Race condition edge case when pulling local changes
> ---
>
> Key: COUCHDB-1363
> URL: https://issues.apache.org/jira/browse/COUCHDB-1363
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 1.0.3, 1.1.1
>Reporter: Randall Leeds
>Priority: Minor
> Fix For: 1.2, 1.3
>
> Attachments: 0001-Fix-a-race-condition-starting-replications.patch
>
>
> It's necessary to re-open the #db after subscribing to notifications so that 
> updates are not lost. In practice, this is rarely problematic because the 
> next change will cause everything to catch up, but if a quick burst of 
> changes happens while replication is starting the replication can go stale. 
> Detected by intermittent replicator_db js test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (COUCHDB-1363) Race condition edge case when pulling local changes

2011-12-14 Thread Randall Leeds (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/COUCHDB-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Leeds reassigned COUCHDB-1363:
--

Assignee: Filipe Manana

> Race condition edge case when pulling local changes
> ---
>
> Key: COUCHDB-1363
> URL: https://issues.apache.org/jira/browse/COUCHDB-1363
> Project: CouchDB
>  Issue Type: Bug
>  Components: Database Core
>Affects Versions: 1.0.3, 1.1.1
>Reporter: Randall Leeds
>Assignee: Filipe Manana
>Priority: Minor
> Fix For: 1.2, 1.3
>
> Attachments: 0001-Fix-a-race-condition-starting-replications.patch
>
>
> It's necessary to re-open the #db after subscribing to notifications so that 
> updates are not lost. In practice, this is rarely problematic because the 
> next change will cause everything to catch up, but if a quick burst of 
> changes happens while replication is starting the replication can go stale. 
> Detected by intermittent replicator_db js test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (COUCHDB-1363) Race condition edge case when pulling local changes

2011-12-14 Thread Randall Leeds (Created) (JIRA)
Race condition edge case when pulling local changes
---

 Key: COUCHDB-1363
 URL: https://issues.apache.org/jira/browse/COUCHDB-1363
 Project: CouchDB
  Issue Type: Bug
  Components: Database Core
Affects Versions: 1.1.1, 1.0.3
Reporter: Randall Leeds
Priority: Minor
 Fix For: 1.2, 1.3


It's necessary to re-open the #db after subscribing to notifications so that 
updates are not lost. In practice, this is rarely problematic because the next 
change will cause everything to catch up, but if a quick burst of changes 
happens while replication is starting the replication can go stale. Detected by 
intermittent replicator_db js test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Enrolment open: Introduction to CouchDB Development

2011-12-14 Thread Nathan Stott
I'd like to get involved.  I use CouchDB all the time and even own the npm
'couchdb' repository but I have never done any actual coding on the CouchDB
core itself.

On Wed, Dec 14, 2011 at 4:13 PM, Joan Touzet  wrote:

> Enrolment for the "Introduction to CouchDB Development" course is now
> open at http://moodle.wohmart.com/ !
>
> As previously described, this course will get you up to speed in Erlang
> (40%), the fundamentals of how CouchDB is implemented (40%), and will
> culminate in a small group project (20%).
>
> There will be guest presentations from these illustrious contributors:
>  * Randall Leeds
>  * Bob Dionne
>  * Adam Kocoloski
>  * Dale Harvey
>  * Paul Davis
>  * Benoit Chesneau
>  * Robert Newson
>  * Jan Lehnardt
>  * ...with more to come!
>
> The course runs from 2012.1.9 - 2012.3.20 or so, with a commitment of
> about 4-8 hours per week recommended. Also, be aware that this is an
> online studio course, meaning it's run entirely online and in the open.
> It's 100% free (as in beer and freedom).
>
> Prerequisites:
>  Strong knowledge of at least one programming language, preferably
>non-scripted [*]
>  Know how to use CouchDB. This is not a CouchDB course.
>
> To enrol:
>  1. Reply to me privately by email, or via IRC (freenode's
> #couchdb, wohali), or via Twitter (@wohali).
>  2. I will give you the enrolment key.
>  3. Visit http://moodle.wohmart.com/ , select the course and register
> a new user using the provided enrolment key.
>
> These people get priority enrolment until 12-16 since they expressed
> early interest:
>  Timothy Chen
>  Roman Geber
>  Matt Adams
>  Bryan Green
>  Sean Copenhaver
>  Pete Vander Giessen
>  Clifford Hung
>  Dave Cottlehuber
>
> See you there,
> Joan
>
> [*] If your only programming language is JavaScript, prepare to devote
> more time to the course for the first 4 weeks. Moving to Erlang from
> JavaScript will take more effort than if you have a background in at
> least one non-scripting language.
>


Re: Unique instance IDs?

2011-12-14 Thread Joan Touzet
On Wed, Dec 14, 2011 at 04:13:41PM -0800, Randall Leeds wrote:
> I might argue that these bits at the end are link and network layer
> issues that we don't care about.

On the contrary - until there is a solution in the mainline to deal with
NATs and firewalls, you cannot assume that a CouchDB instance can be
seen publicly. A very common use case (for the application I've been
working on) is for a desktop machine, behind a NAT, to have continuous
2-way replication with a public server. Imagine 30 or so machines
connected to such a central server. 31 databases, 62 ongoing
replications, but only one them has a valid URL. Each desktop will have
to pull from and push to the central server; the central server on its
*own* cannot access the machines behind the firewalls.

This is not so unusual a case, is it?

> pulling from or pushing to the device. In this case, the db on the
> mobile device can be identified by a bare database name without any
> URL at all. Examples: Pull http://remotecouch/mydb -> mydb; Push mydb
> -> http://remotercouch/mydb. The replicator works like this today.

See above - the same approach (the exact opposite of what you've
suggested) is the most workable solution *today*.

> The CouchDB community is being very radical by suggesting that we
> might _serve_ content or address content stored on a mobile device.

Yes! And further more solving how to serve data from behind a NAT or
firewall is equally challenging - but doable.

> Given the commitment CouchDB has made to HTTP so far, I hesitate to
> say that the solution to this problem is to subvert URLs.

I'm not saying that's the solution. I'm saying that URLs cannot
necessarily identify all participants in a replication scenario
reliably, especially given RFC 1918 space, variable host names, mobile
platforms and NAT.

> Again, this is getting away from the transitive checkpoint problem,
> which may turn out to obviate the need for identification of databases
> in the first place. Or, as I put it earlier, to focus the problem on
> "what is in this database" rather than "what database this is".

+100 on solving things this way. :)

-Joan


Re: Enrolment open: Introduction to CouchDB Development

2011-12-14 Thread David Pratt
I am very interested in this course.  Please let me know what I can do to 
enroll.

David Pratt
w> 801-422-4823
e> david.pr...@byu.edu


On Dec 14, 2011, at 3:14 PM, "Joan Touzet"  wrote:

> Enrolment for the "Introduction to CouchDB Development" course is now
> open at http://moodle.wohmart.com/ ! 
> 
> As previously described, this course will get you up to speed in Erlang
> (40%), the fundamentals of how CouchDB is implemented (40%), and will
> culminate in a small group project (20%).
> 
> There will be guest presentations from these illustrious contributors:
> * Randall Leeds
> * Bob Dionne
> * Adam Kocoloski
> * Dale Harvey
> * Paul Davis
> * Benoit Chesneau
> * Robert Newson
> * Jan Lehnardt
> * ...with more to come!
> 
> The course runs from 2012.1.9 - 2012.3.20 or so, with a commitment of
> about 4-8 hours per week recommended. Also, be aware that this is an
> online studio course, meaning it's run entirely online and in the open. 
> It's 100% free (as in beer and freedom).
> 
> Prerequisites:
> Strong knowledge of at least one programming language, preferably
>   non-scripted [*]
> Know how to use CouchDB. This is not a CouchDB course.
> 
> To enrol:
> 1. Reply to me privately by email, or via IRC (freenode's
>#couchdb, wohali), or via Twitter (@wohali).
> 2. I will give you the enrolment key.
> 3. Visit http://moodle.wohmart.com/ , select the course and register
>a new user using the provided enrolment key.
> 
> These people get priority enrolment until 12-16 since they expressed
> early interest:
> Timothy Chen
> Roman Geber
> Matt Adams
> Bryan Green
> Sean Copenhaver
> Pete Vander Giessen
> Clifford Hung
> Dave Cottlehuber
> 
> See you there,
> Joan
> 
> [*] If your only programming language is JavaScript, prepare to devote
> more time to the course for the first 4 weeks. Moving to Erlang from
> JavaScript will take more effort than if you have a background in at
> least one non-scripting language.



Re: Unique instance IDs?

2011-12-14 Thread Randall Leeds
On Wed, Dec 14, 2011 at 10:52, Alex Besogonov  wrote:
> On Wed, Dec 14, 2011 at 3:55 AM, Randall Leeds  
> wrote:
>> I think you miss the point that was made above about mirrors, still,
>> unless I misunderstand. B may have other changes interleaves with
>> those received from A, whether from interactive updates or other
>> replications, making its hashes different.
> Of course. But that's not a problem, because we save all the A's
> changeset hashes
> that we've seen during the replication. B's resulting hash would be
> different, but we don't
> care about it.
>
> Also, since merging is commutative and associative we can reorder changesets
> in any way, so interleaving changes in itself should be OK.

I might need you to restart your solution for me to understand. If the
hash tree isn't of the sequence or id index, then I'm not seeing what
this applies to except the rev tree of a single document. Documents do
already have a sort of hash, as you identified, in their revision id.
Comparing the presence of these on the client and server is already
part of the replication protocol. However, since CouchDB is _not_ a
versioned document store ("_rev is only for MVCC"), there's no need to
optimize the problem of diffing the revs present. Only the newest revs
need ever be replicated.

The sequence number checkpointing is an optimization to avoid
comparing the revs for all documents. I think progress looks like
finding a way to skip large chunks of the seq index because they
contain changes already received, possibly from elsewhere.

So I'm not sure what your solution proposes. Can you go further?

-Randall


Re: Unique instance IDs?

2011-12-14 Thread Randall Leeds
On Wed, Dec 14, 2011 at 14:52, Joan Touzet  wrote:
> -1 on using URI/URLs, for the simple fact that mobile and desktop
> devices often don't have a stable hostname and/or IP address. This is a
> huge area where CouchDB is used, increasingly so, and attempting to tie
> a DB UUID to something inherently variable on the platform is doomed to
> fail.
>
> Renaming my PC or phone, getting a new DHCP address, connecting to a
> different network or changing the MAC address of my NIC should not
> invalidate my DBs, their "UUIDs", or cause unreasonable problems for
> replication.
>
> -Joan

I might argue that these bits at the end are link and network layer
issues that we don't care about. As far as the Web is concerned, the
URL is the address and it's more than just convenience and readability
that separates that from an IP address. URLs are foundational to
resource identification on the Web, and I'm really hesitant to "work
around" that (nevertheless I've dreaming up and reading all kinds of
ways to do just this these days, and it's pretty hard). I definitely
don't mean to condescendingly suggest you don't know this already; I'm
just restating the basic facts.

Take, for example, the mobile use case. Most people, I'd submit, want
to push from and pull data to a mobile device. Given that the device
doesn't have a stable address (neither in IP nor URL space), most
would punt on the problem of serving from the mobile device, i.e.
pulling from or pushing to the device. In this case, the db on the
mobile device can be identified by a bare database name without any
URL at all. Examples: Pull http://remotecouch/mydb -> mydb; Push mydb
-> http://remotercouch/mydb. The replicator works like this today.

I think it's generally accepted that URLs don't point at the same
device all the time. In practice, obviously, they very frequently
"point at many devices" in that reverse proxies are used all over the
Web for load balancing. I might say it's out of scope for CouchDB to
worry about tying a stable URL to a mobile device. For the ops person
in the datacenter the story right now is clear: if you want to copy
your database, you should probably also copy the hostname over to the
new box or replication starts over.

The CouchDB community is being very radical by suggesting that we
might _serve_ content or address content stored on a mobile device.
Given the commitment CouchDB has made to HTTP so far, I hesitate to
say that the solution to this problem is to subvert URLs.

Again, this is getting away from the transitive checkpoint problem,
which may turn out to obviate the need for identification of databases
in the first place. Or, as I put it earlier, to focus the problem on
"what is in this database" rather than "what database this is".

-Randall


Re: Unique instance IDs?

2011-12-14 Thread Jason Smith
I think my point is, if URLs don't work, nothing will. There's no free lunch.

But if an optimization surfaces, I will happily stand corrected.

On Thu, Dec 15, 2011 at 5:52 AM, Joan Touzet  wrote:
> -1 on using URI/URLs, for the simple fact that mobile and desktop
> devices often don't have a stable hostname and/or IP address. This is a
> huge area where CouchDB is used, increasingly so, and attempting to tie
> a DB UUID to something inherently variable on the platform is doomed to
> fail.
>
> Renaming my PC or phone, getting a new DHCP address, connecting to a
> different network or changing the MAC address of my NIC should not
> invalidate my DBs, their "UUIDs", or cause unreasonable problems for
> replication.
>
> -Joan



-- 
Iris Couch


Re: Unique instance IDs?

2011-12-14 Thread Joan Touzet
-1 on using URI/URLs, for the simple fact that mobile and desktop
devices often don't have a stable hostname and/or IP address. This is a
huge area where CouchDB is used, increasingly so, and attempting to tie
a DB UUID to something inherently variable on the platform is doomed to
fail.

Renaming my PC or phone, getting a new DHCP address, connecting to a
different network or changing the MAC address of my NIC should not
invalidate my DBs, their "UUIDs", or cause unreasonable problems for
replication.

-Joan


Enrolment open: Introduction to CouchDB Development

2011-12-14 Thread Joan Touzet
Enrolment for the "Introduction to CouchDB Development" course is now
open at http://moodle.wohmart.com/ ! 

As previously described, this course will get you up to speed in Erlang
(40%), the fundamentals of how CouchDB is implemented (40%), and will
culminate in a small group project (20%).

There will be guest presentations from these illustrious contributors:
  * Randall Leeds
  * Bob Dionne
  * Adam Kocoloski
  * Dale Harvey
  * Paul Davis
  * Benoit Chesneau
  * Robert Newson
  * Jan Lehnardt
  * ...with more to come!

The course runs from 2012.1.9 - 2012.3.20 or so, with a commitment of
about 4-8 hours per week recommended. Also, be aware that this is an
online studio course, meaning it's run entirely online and in the open. 
It's 100% free (as in beer and freedom).

Prerequisites:
  Strong knowledge of at least one programming language, preferably
non-scripted [*]
  Know how to use CouchDB. This is not a CouchDB course.

To enrol:
  1. Reply to me privately by email, or via IRC (freenode's
 #couchdb, wohali), or via Twitter (@wohali).
  2. I will give you the enrolment key.
  3. Visit http://moodle.wohmart.com/ , select the course and register
 a new user using the provided enrolment key.

These people get priority enrolment until 12-16 since they expressed
early interest:
  Timothy Chen
  Roman Geber
  Matt Adams
  Bryan Green
  Sean Copenhaver
  Pete Vander Giessen
  Clifford Hung
  Dave Cottlehuber

See you there,
Joan

[*] If your only programming language is JavaScript, prepare to devote
more time to the course for the first 4 weeks. Moving to Erlang from
JavaScript will take more effort than if you have a background in at
least one non-scripting language.


Re: Unique instance IDs?

2011-12-14 Thread Alex Besogonov
On Wed, Dec 14, 2011 at 3:55 AM, Randall Leeds  wrote:
> I think you miss the point that was made above about mirrors, still,
> unless I misunderstand. B may have other changes interleaves with
> those received from A, whether from interactive updates or other
> replications, making its hashes different.
Of course. But that's not a problem, because we save all the A's
changeset hashes
that we've seen during the replication. B's resulting hash would be
different, but we don't
care about it.

Also, since merging is commutative and associative we can reorder changesets
in any way, so interleaving changes in itself should be OK.


Re: Unique instance IDs?

2011-12-14 Thread Randall Leeds
On Tue, Dec 13, 2011 at 20:08, Alex Besogonov  wrote:
> On Mon, Dec 12, 2011 at 10:26 PM, Paul Davis
>  wrote:
>>> * Merkle trees are great for two-way synchronization, but it's not 
>>> immediately clear to me how you'd use them to bootstrap a single source -> 
>>> target replication.  I might just be missing a straightforward extension of 
>>> the tech here.
>> This is the point that's important with checksums and so on. Merkle
>> trees are great when you want to mirror structured data but CouchDB
>> replication is a mirror operation. Think, N db's replicating to a
>> central DB. you have a mixture of things which breaks checksums (or at
>> least any obvious application I can think of given our internal
>> structures)
> Uhm. What are the things that break checksums? Right now revision IDs
> are _almost_
> deterministic and it's not that hard to make them completely
> deterministic. And for
> replication purposes nothing else matters.
>
> To be exact: the only entity used for ID generation is '[Deleted,
> OldStart, OldRev, Body, Atts2]'
> tuple and only 'Atts2' field can be non-deterministic. And that can be
> fixed (with other minor
> forward-looking features like explicit versioning).
>
> Then it's easy to devise a protocol to replicate based on hash trees.
> I'm thinking about
> this protocol:
> 1) The current state of replicated database is identified by a hash.
> Suppose that we
> have unidirectional replication A->B.
>
> Let's denote state of the initial database A as A1 and B's as B1.
>
> We store the ancestry as a list of hashes outside database (so it
> doesn't influence the
> hash of the database).
>
> 2) As the first step B sends its list of replication ancestry.
>
> It's actually not even required to send the whole hashes each time,
> just send the first
> 4 bytes of each hash. That way even 1 million records of replication
> history would take
> only 4Mb. The 'A' server then replies with its own set of hashes with
> the matching
> initial bytes. If there are none, then the client falls back to the
> usual replication.
>
> So at this step 'B' knows the most recent common ancestor and requests
> the changes
> that have happened since that point of time. Each changeset, naturally, has 
> its
> own hash.
>
> 3) After these changes are applied and merged, B's state is the A1
> state plus all the
> B's changes that might have happened ever since. Then B stores the hashes
> of the changesets that have been applied.
>
> That's it. Should work, as far as I see (it's 3am local time, so I
> might miss something).
>
> Overhead: 16 bytes for the hash information for each changeset.

I think you miss the point that was made above about mirrors, still,
unless I misunderstand. B may have other changes interleaves with
those received from A, whether from interactive updates or other
replications, making its hashes different.