Re: Bulk Load

Jan Lehnardt Wed, 17 Sep 2008 08:18:45 -0700

Hi Ronny,

sorry, late reply.


Once way to re-introduce optimistic locking is saving the new revision
over the latest one and then copying the previous revision of the doc
into a new doc. You can't run compaction in between, but since you
control it, JUST DON'T CALL IT ;-).

Cheers
Jan
--
On Sep 14, 2008, at 23:49, Ronny Hanssen wrote:

Thanks for your reply, Jan.
I do remember the discussion in the mailinglist, but at the time Ididn'tunderstand the argumentation. Maybe because I really didn't havetime todive into the matter back then. But, it seriously has puzzled mesince. Thenthis post appears and I jump at the chance to get this cleared out(sorry
for being slow - which makes me the opposite of arrogant I guess :D).
But, I don't have a solution. I guess you are right in that sense. Ijustfail to see that making new docs are making life easier? I believeit makesthe single node case worse and probably equally difficult (or worse)for thedistributed multiple node architecture. Reading from what you say,there is"evil" lurking in the replication process no matter which way wehandlethis. I mean, for multiple nodes the replication would probably beslowerthan the return to the users changing the same doc on two differentnodes tobe informed. This would result in multiple versions of the same docbeingaround, at least until replication - when couchdb would find outthat twocompeting versions exist. I might be wrong about this, but the userscan'tbe left waiting for an "ok-saved" reply from couchdb "forever",right? So,couchdb would have to decide which version "wins" duringreplication, right?
Considering the effects you are hinting about, I'd personally want asinglenode couchdb for writes, with extra nodes for reading and servingviews...Maybe additional write-nodes for different doc-types (one write-nodeprdoc-type)... Just to "ensure" that there cannot be two+ docs updatedat two+nodes simultaneously. That is, in the beginning I'd really rather gofor a
single node, with a replicated backup/failover. As (if) system stress
increase I'd opt for splitting write and reads on nodes and/orcreatingwrite-nodes designated for different doc-types. This is still notperfect,
but distributed never will be, really.
Unless... If the couchdb data was stored in a distributed file-system (NASor SAN), each copy of the couchdb process would be operating on thesamedisk. This doesn't mean more data-reliability and also imposesdelays in
reads and writes. But, it would mean that couchdb would be scalable
(multiple (vurtual" nodes work on same physical disk). Other"physical"
nodes could be created that would replicate as couchdb is set up to do
already. So, allowing "virtual" nodes could work out as a niceaddition I
think.
But, then again, my knowledge in distributed file-systems (NAS orSAN) arereally limited... And, I might have missed out on alot more thanthat - so
all this might of course just be stupid :)

Thank's for reading.

~Ronny

2008/9/14 Jan Lehnardt <[EMAIL PROTECTED]>
Hi Ronny,
On Sep 14, 2008, at 11:45, Ronny Hanssen wrote:
Or have I seriously missed out on some vital information?Because, based
on
the above I still feel very confused about why we cannot use thebuilt-in
rev-control mechanism.
You correctly identify that adding revision control to a single node
instance of
CouchDB is not that hard (a quick search through the archives wouldhave
told
you, too :-) Making all that work in a distributed environment with
replication conflict
detection and all is mighty hard. If you can come up with a nice anclean
solution to
make proper revision control work with CouchDB's replicationincluding all
the weird
edge cases I don't even know about (aren't I arrogant thismorning? :), we
are happy
to hear about it.

Cheers
Jan
--
~Ronny

2008/9/14 Jeremy Wall <[EMAIL PROTECTED]>

Two reasons.
* First as I understand it the revisions are not changes between
documents.
They are actual full copies of the document.
* Second revisions get blown away when doing a database compact.
Something
you will more than likely want to do since it eats up databasespace
fairly
quickly. (see above for the reason why)

That said there is nothing preventing you from storing revisions in
CouchDB.
You could store a changeset for each document revision is aseperaterevision document that accompanies your main document. It wouldbe reallyeasy and designing views to take advantage of them to show arevision
history for you document would be really easy.

I suppose you could use the revisions that CouchDB stores but that
wouldn't
be very efficient since each one is a complete copy of thedocument. And
you
couldn't depend on that "feature not changing behaviour on you inlater
versions since it's not intended for revision history as a feature.

On Sat, Sep 13, 2008 at 7:24 PM, Ronny Hanssen <[EMAIL PROTECTED]
wrote:
Why is the revision control system in couchdb inadequate for, well,
revision
control? I thought that this feature indeed was a feature, notjust an
internal mechanism for resolving conflicts?
Ronny

2008/9/14 Calum Miller <[EMAIL PROTECTED]>

Hi Chris,
Many thanks for your prompt response.
Storing a complete new version of each bond/instrument everyday seems
a
tad excessive. You can imagine how fast the database will growovertime
if a
unique version of each instrument must be saved, rather thanjust theindividual changes. This must be a common pattern, not confinedtoinvestment banking. Any ideas how this pattern can beaccommodated
within
CouchDB?
Calum Miller





Chris Anderson wrote:

Calum,
CouchDB should be easily able to handle this load.
Please note that the built-in revision system is not designedfordocument history. Its sole purpose is to manage conflictingdocumentsthat result from edits done in separate copies of the DB,which are
subsequently replicated into a single DB.
If you allow CouchDB to create a new document for each dailyimport ofeach security, and create a view which makes these documentsavailableby security and date, you should be able to access securitieshistory
fairly simply.

Chris

On Sat, Sep 13, 2008 at 12:31 PM, Calum Miller <
[EMAIL PROTECTED]>
wrote:
Hi,
I trying to evaluate CouchDB for use within investmentbanking, yes
some
of
these banks still exist. I want to load 500,000 bonds into the
database
with
each bond containing around 100 fields. I would be looking tobulk
load
a
similar amount of these bonds every day whilst maintaining ahistory
via
the
revision feature. Are there any bulk load features availablefor
CouchDB
and
any tips on how to manage regular loads of this volume?

Many thanks in advance and best of luck with this project.

Calum Miller

Re: Bulk Load

Reply via email to