EnricoMi commented on code in PR #44470:
URL: https://github.com/apache/arrow/pull/44470#discussion_r2079049038
##########
cpp/src/arrow/dataset/file_test.cc:
##########
@@ -353,6 +356,89 @@ TEST_F(TestFileSystemDataset, WriteProjected) {
}
}
+// this kernel delays execution for some specific scalar values
+Status delay(compute::KernelContext* ctx, const compute::ExecSpan& batch,
+ compute::ExecResult* out) {
+ const ArraySpan& input = batch[0].array;
+ const uint32_t* input_values = input.GetValues<uint32_t>(1);
+ uint8_t* output_values = out->array_span()->buffers[1].data;
+
+ // Boolean data is stored in 1 bit per value
+ for (int64_t i = 0; i < input.length; ++i) {
+ if (input_values[i] % 16 == 0) {
+ std::this_thread::sleep_for(std::chrono::milliseconds(10));
+ }
Review Comment:
On my machine, a single non-delayed batch gets _ooo_ with a probability of
1%, delayed batches with 67%. If the chance for a single batch to be
out-of-order (_ooo_) is $p$, then the chance for $N$ batches to be _ooo_ is $1
- (1-p)^N$. Therefore, a sequence to be _ooo_ has a chance of $1 -
(1-\frac{1}{100})^{1024-128} = 99.988\\%$ caused by non-delayed batches and $1
- (1-\frac{67}{100})^{128} = 100\\%$ for delayed batches.
Looks like the delay is not needed, it is a safety-net that _ooo_ occurs
(chance increases by almost two orders of magnitude), and having the delay code
in the unit test stresses the fact that this is sensitive to timing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]