[
https://issues.apache.org/jira/browse/HIVE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401067#comment-13401067
]
Ashutosh Chauhan commented on HIVE-3048:
----------------------------------------
+1
Existing implementation actually looks buggy to me. It checks for existence of
one object and then adds another object. In general case, two object's may have
different hashcodes and then you are screwed. It will work however as long as
underlying object's hashcode is based on value which will be the case for
primitive types and containers containing primitive types which is the case for
Hive datatypes. It's always a good practice to just add your objects in set and
let set take care of duplicate elimination.
> Collect_set Aggregate does uneccesary check for value.
> ------------------------------------------------------
>
> Key: HIVE-3048
> URL: https://issues.apache.org/jira/browse/HIVE-3048
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 0.8.1
> Reporter: Edward Capriolo
> Assignee: Edward Capriolo
> Attachments: HIVE-3048.patch.1.txt
>
>
> Sets already de-duplicate for free no need for existence check.
> {noformat}
> private void putIntoSet(Object p, MkArrayAggregationBuffer myagg) {
> if (myagg.container.contains(p))
> return;
> Object pCopy = ObjectInspectorUtils.copyToStandardObject(p,
> this.inputOI);
> myagg.container.add(pCopy);
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira