[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018341#comment-15018341 ]
Keith Turner commented on ACCUMULO-4062: ---------------------------------------- Personally, I think the issue of deduping the same col+values in different order that Josh brought up is a good reason not to do it. Because I think handling this case properly would be expensive. Maybe its cheap if we do not worry about that case, but since its a half solution user concerned about this still may have to dedupe outside the batchwriter. Could do something like the way java streams work e.g. {{new ZipOutputStream(new FileOutputStream())}}. One possibility is that we create a DedupingBatchWriter that wraps a BatchWriter like {{new DedupingBatchWriter(batchWriter, options)}}. Also Mutations are not usually used as a key in Accumulo code. Most code just keys on the mutations row. The Mutations hashcode and equals functions would need a good set of unit test added if something in Accumulo were going to rely on them. > Change MutationSet.mutations to use HashSet > ------------------------------------------- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client > Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap<String,List<Mutation>> > {code} > to > {code} > HashMap<String,HashSet<Mutation>> > {code} > so that duplicate mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)