pitrou commented on code in PR #41335:
URL: https://github.com/apache/arrow/pull/41335#discussion_r1599962402


##########
cpp/src/arrow/acero/hash_join_node_test.cc:
##########
@@ -3201,5 +3203,55 @@ TEST(HashJoin, ChainedIntegerHashJoins) {
   }
 }
 
+// Test that a large number of joins don't overflow the temp vector stack, 
like GH-39582
+// and GH-39951.
+TEST(HashJoin, ManyJoins) {
+  // The idea of this case is to create many nested join nodes that may 
possibly cause
+  // recursive usage of temp vector stack. To make sure that the recursion 
happens:
+  // 1. A left-deep join tree is created so that the left-most (the final 
probe side)
+  // table will go through all the hash tables from the right side.
+  // 2. Left-outer join is used so that every join will increase the 
cardinality.
+  // 3. The left-most table contains rows of unique integers from 0 to N.
+  // 4. Each right table at level i contains two rows of integer i, so that 
the probing of
+  // each level will increase the result by one row.
+  // 5. The left-most table is a single batch of enough rows, so that at each 
level, the
+  // probing will accumulate enough result rows to have to output to the 
subsequent level
+  // before finishing the current batch (releasing the buffer allocated on the 
temp vector
+  // stack), which is essentially the recursive usage of the temp vector stack.
+
+  // A fair number of joins to guarantee temp vector stack overflow before 
GH-41335.
+  const int num_joins = 64;
+
+  // `ExecBatchBuilder::num_rows_max()` is the number of rows for swiss join 
to accumulate
+  // before outputting.
+  const int num_left_rows = ExecBatchBuilder::num_rows_max();
+  ASSERT_OK_AND_ASSIGN(
+      auto left_batches,
+      MakeIntegerBatches({[](int row_id) -> int64_t { return row_id; }},
+                         schema({field("l_key", int8())}),

Review Comment:
   Shouldn't this use a wider type in case `num_rows_max` is above 127?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to