Github user gdfm commented on the pull request:
https://github.com/apache/incubator-samoa/pull/11#issuecomment-101618723
Indeed, I see what you mean. Given that the feedback loop in Flink is
faster, the number of attempts to split should increase.
This is expected, but the number of such attempts is upper bounded by the
ones tried on the Local engine, where there is no delay between request of the
split criterion and response by the local statistics.
We already have some flow control to regulate the rate of ingestion in
PrequentialEvaluation. I'll play a bit with it to see what happens.
When you put the 2 seconds delay in the Flink Processors, what happens (I
guess) is that the whole data streams through a very rough, sub-optimal version
of the tree. So it's very fast, but the precision drops considerably because of
the artificial limit on the number of split attempts.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---