Aaron, I have to disagree with you. By default, Accumulo tables are distributed maps. However, as soon as you configure an aggregator or some other interesting iterator on a table the semantics for that table change and it is no longer a "proper" distributed map. Therefore I claim that the basic tenant to which you refer does not exist as such.
Users generally don't set the timestamps in a mutation, and aggregators certainly don't preserve the keys that they aggregate. Are you suggesting that modifying the value associated with a key that has already contributed to a persisted aggregate should have an affect that is dependent on the original value? So, if I sum a:foo:bar->1 and then a:foo:bar->2 I should get 2? The fix that is suggested in this ticket just makes the behavior consistent between the cases of putting two identical entries in one mutation versus putting the two entries in two mutations. However we account for the semantics of aggregation we should be for this change. Adam On Thu, Dec 22, 2011 at 12:31 PM, Aaron Cordova (Commented) (JIRA) < [email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/ACCUMULO-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174913#comment-13174913] > > Aaron Cordova commented on ACCUMULO-227: > ---------------------------------------- > > What the client should expect is that Accumulo will only store/process one > value per unique key: Accumulo is a distributed map. Even if it's only for > aggregation's sake, allowing Mutations to submit multiple values per unique > key and processing all those values, rather than arbitrarily choosing one, > violates the concept of a map, which will cause more confusion on the part > of users. > > The right thing to do for users who want to submit lots of values to > aggregate under a sub key is to insist that they make their cells differ by > at least one element in the key. Again, aggregating multiple values under > the same key violates the basic tenet that Accumulo is a map. Aggregation > is performed across different keys sharing a sub key. > > If having the users generate unique timestamps is a problem, there are > several strategies for dealing with that. One is to generate random > timestamps. If aggregation is being done over timestamps, the actual > timestamp shouldn't matter / ever be interpreted. If there are worries > about Accumulo doing something undesired with random timestamps, one could > generate random column qualifiers, etc. and aggregate over those. > > To address what Adam said about versioning - aggregating tables should > probably turn off the iterator that only keeps the latest version. But that > has nothing to do with the policy for handling multiple identical cells. > > Finally, I'm not advocating we do anything to support aggregation on the > client side, but rather leave it up to the application developer to exploit > any opportunities for aggregation in their application. > > > > Improve in memory map counts to provide cell level uniqueness for > repeated columns in mutation > > > ----------------------------------------------------------------------------------------------- > > > > Key: ACCUMULO-227 > > URL: https://issues.apache.org/jira/browse/ACCUMULO-227 > > Project: Accumulo > > Issue Type: Improvement > > Components: tserver > > Reporter: John Vines > > Assignee: John Vines > > Fix For: 1.5.0 > > > > > > Currently for isolation we only isolate mutations. This doesn't allow > mutations with identical cells within it. We should increase the mutation > counts to account for each individual cell instead of each mutation. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > For more information on JIRA, see: http://www.atlassian.com/software/jira > > >
