[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

ASF GitHub Bot (Jira) Mon, 27 Mar 2023 04:51:13 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853152&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853152
 ]


ASF GitHub Bot logged work on HIVE-26655:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Mar/23 11:50
            Start Date: 27/Mar/23 11:50
    Worklog Time Spent: 10m 
      Work Description: abstractdog commented on code in PR #4158:
URL: https://github.com/apache/hive/pull/4158#discussion_r1149197845


##########
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java:
##########
@@ -602,4 +602,11 @@ public void assignRowColumn(VectorizedRowBatch batch, int 
batchIndex, int column
     Aggregation bfAgg = (Aggregation) agg;
     outputColVector.setVal(batchIndex, bfAgg.bfBytes, 0, bfAgg.bfBytes.length);
   }
+
+  /**
+   * Let's clone the batch when we're working in parallel, see HIVE-26655.
+   */
+  public boolean batchNeedsClone() {
+    return numThreads > 0;
+  }

Review Comment:
   still need yes, thread=1 means the executor start processing the bloomfilter 
on 1 thread async while the main thread is fetching the next one





Issue Time Tracking
-------------------

    Worklog Id:     (was: 853152)
    Time Spent: 40m  (was: 0.5h)

> VectorUDAFBloomFilterMerge should take care of safe batch handling when 
> working in parallel
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26655
>                 URL: https://issues.apache.org/jira/browse/HIVE-26655
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sungwoo Park
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When tested with 100GB ORC tables, the number of rows returned by query 17 is 
> not stable. It returns fewer rows than the correct result (55 rows).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26655) VectorUDAFBloomFilterMerge should take care of safe batch handling when working in parallel

Reply via email to