On Jan 29, 2013, at 3:45 AM, "Jukka Zitting" <jukka.zitt...@gmail.com> wrote:
> Hi, > > On Tue, Jan 29, 2013 at 1:21 PM, Thomas Mueller <muel...@adobe.com> wrote: >> It's not clear to me how to support scalable concurrent writes. This is >> also a problem with the current MongoMK design, but I in your design I >> actually see more problems in this area (concurrent writes to nodes in the >> same segment for example). But maybe it's just that I don't understand >> this part of your design yet.. > > Segments are immutable, so a commit would create a new segment instead > of modifying an existing one. The new segment would contain just the > modified parts of the tree and refer to the older segment(s) for the > remaining tree. A quick estimate of the size overhead of a minimal > commit that updates just a single property is in the order of hundreds > of bytes, depending a bit on the content structure. Does this mean modifying the same content 10 times changing "most properties" on a certain node will grow the repo 10x? Thanks Tyson > >> The data format in your proposal seems to be binary and not Json. For me, >> using Json would have the advantage that we can use MongoDb features >> (queries, indexes, atomic operations, debugging,..). With your design, >> only 1% of the MongoDb features could be used (store a record, read a >> record), so that basically we would need to implement the remaining >> features ourselves. On the other hand, it would be extremely simple to >> port to another storage engine. As far as I understand, all the data might >> as well be stored in the data store / blob store with very little changes. > > Right. In addition to storage-independence, the main reasons for going > with a custom binary format instead of JSON was to avoid having to > parse an entire segment just to access an individual node or value. > > Note that the proposed design actually does rely on lots of MongoDB > features beyond basic CRUD. Things like sharding, distributed access, > atomic updates, etc. are essential for the design to scale up well. > >> As far as I understand, a commit where only one single value is changed >> would result in one journal entry and one segment. I was thinking, would >> it be possible to split a segment / journal into smaller blocks in such >> case, but I'm not sure how complex that would be. And the reverse: merge >> small segments from time to time. > > Indeed, see my response to Marcel's post. > > BR, > > Jukka Zitting