[ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=522547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522547
 ]

ASF GitHub Bot logged work on HIVE-24471:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Dec/20 04:51
            Start Date: 10/Dec/20 04:51
    Worklog Time Spent: 10m 
      Work Description: maheshk114 commented on a change in pull request #1736:
URL: https://github.com/apache/hive/pull/1736#discussion_r539843531



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
##########
@@ -712,6 +751,12 @@ private void processKey(Object row,
 
   @Override
   public void process(Object row, int tag) throws HiveException {
+    if (hashAggr) {
+      if (getConfiguration().get("forced.streaming.mode", 
"false").equals("true")) {

Review comment:
       i have removed it in the next commit ..had added for test only.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 522547)
    Time Spent: 1h 20m  (was: 1h 10m)

> Add support for combiner in hash mode group aggregation 
> --------------------------------------------------------
>
>                 Key: HIVE-24471
>                 URL: https://issues.apache.org/jira/browse/HIVE-24471
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used. If the hash table 
> size increases beyond configurable limit, data is flushed to disk and new 
> hash table is generated. If the reduction by hash table is less than min hash 
> aggregation reduction calculated during compile time, the map side 
> aggregation is converted to streaming mode. So if the first few batch of 
> records does not result into significant reduction, then the mode is switched 
> to streaming mode. This may have impact on performance, if the subsequent 
> batch of records have less number of distinct values. To mitigate this 
> situation, a combiner can be added to the map task after the keys are sorted. 
> This will make sure that the aggregation is done if possible and reduce the 
> data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to