I am not sure that Big Table can really be thought of as a map, in the sense of a Java Tree Map. Inserting the exact same key as an existing key will not overwrite the value in a deterministic way like it would in a TreeMap. To truly overwrite a value you must insert with a key that has a greater timestamp.
To support making updates to a key using the exact same timestamp a BigTable implementation would need to keep another hidden timestamp (or something that indicates order of arrival). Otherwise the system has no way know which value came second and which to suppress. I do not remember any mention of a secondary hidden timestamp in the BigTable paper. Without this extra info I am not sure how the BigTable model would make deterministic decisions that mimic the order of arrival behavior of a TreeMap. Therefore I suspect BigTable treats keys that are exactly the same the same as it treats multiple versions. Keith On Thu, Dec 22, 2011 at 4:20 PM, Aaron Cordova <[email protected]> wrote: > _You_ can think of it that way, cause you're Adam Fucsh, distributed database > expert extraordinaire, but that's not how the BigTable data model was > described by the original authors - "BigTable is a sparse, sorted, > distributed, multidimensional map", and most users do understand Accumulo to > be a map of keys to values where the keys are made up of a row, colfam, > colqual, colvis, and timestamp and the values are arbitrary byte pairs. > > To start explaining to people that Accumulo is a multi-map, or to actually > make it into a multi-map (i.e. allowing identical keys, where a key includes > the timestamp), would be a mistake, in my opinion. > > > On Dec 22, 2011, at 4:09 PM, Adam Fuchs wrote: > >> Sorry, I thought we were talking about users' perceptions of semantics. >> Bigtable also supports holding multiple versions of key/value pairs, so it >> can be thought of as having an underlying multi-map as well. >> >> Adam >> >> >> On Thu, Dec 22, 2011 at 4:04 PM, Aaron Cordova <[email protected]> wrote: >> >>> >>> On Dec 22, 2011, at 4:00 PM, Adam Fuchs wrote: >>> >>>> Timestamp doesn't usually make >>>> it into the uniqueness concept, from a user's perspective, even though >>> that >>>> affects the sort order of Keys. In fact, most users let Accumulo set the >>>> timestamp for them. I think your definition of uniqueness takes timestamp >>>> into account, and from that perspective what we're doing is sort of like >>>> providing a finer grained timestamp instead of using one timestamp for an >>>> entire Mutation (or for all Mutations that show up within a millisecond). >>> >>> Timestamps do define separate keys. This is not just my definition - this >>> is in the BigTable design as well as Hbase's, and likely every other >>> BigTable clone. >>> >>> >>> >
