[ https://issues.apache.org/jira/browse/HIVE-26655?focusedWorklogId=853152&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-853152 ]
ASF GitHub Bot logged work on HIVE-26655: ----------------------------------------- Author: ASF GitHub Bot Created on: 27/Mar/23 11:50 Start Date: 27/Mar/23 11:50 Worklog Time Spent: 10m Work Description: abstractdog commented on code in PR #4158: URL: https://github.com/apache/hive/pull/4158#discussion_r1149197845 ########## ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java: ########## @@ -602,4 +602,11 @@ public void assignRowColumn(VectorizedRowBatch batch, int batchIndex, int column Aggregation bfAgg = (Aggregation) agg; outputColVector.setVal(batchIndex, bfAgg.bfBytes, 0, bfAgg.bfBytes.length); } + + /** + * Let's clone the batch when we're working in parallel, see HIVE-26655. + */ + public boolean batchNeedsClone() { + return numThreads > 0; + } Review Comment: still need yes, thread=1 means the executor start processing the bloomfilter on 1 thread async while the main thread is fetching the next one Issue Time Tracking ------------------- Worklog Id: (was: 853152) Time Spent: 40m (was: 0.5h) > VectorUDAFBloomFilterMerge should take care of safe batch handling when > working in parallel > ------------------------------------------------------------------------------------------- > > Key: HIVE-26655 > URL: https://issues.apache.org/jira/browse/HIVE-26655 > Project: Hive > Issue Type: Sub-task > Reporter: Sungwoo Park > Assignee: László Bodor > Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > When tested with 100GB ORC tables, the number of rows returned by query 17 is > not stable. It returns fewer rows than the correct result (55 rows). > -- This message was sent by Atlassian Jira (v8.20.10#820010)