In Go, there's a built in race detector that can be used when running tests using `go test`.

There's RacerD [1] from Facebook that can detect races in Java. Perhaps this is something that can be looked in to to find the race.

[1] https://fbinfer.com/docs/racerd.html

On 29/05/2019 9:52 pm, Stamatis Zampetakis wrote:
Good insights, Ruben, Danny!

Assuming that is indeed a concurrency problem, it will be difficult to
identify since many parts in Calcite are not thread safe. Not being able to
reproduce the problem makes the things even worse.

Given that intermittent test failures occur often on Jenkins, how about
creating a new branch dedicated to debugging?
It can be associated with a new Jenkins jobs (that obviously doesn't send
mails to everybody when there are failures) that can be launched on-demand
by the person who is looking into the problem.

On the other hand, we do not really make an effort to have tests that can
be executed concurrently so another alternative would be to run tests only
sequentially.
I am not very fan of this approach since it will rather hide problems than
solve them.






On Wed, May 29, 2019 at 1:02 PM Yuzhao Chen <yuzhao....@gmail.com> wrote:

Thanks Ruben for your good analysis.

What I’m confused is that isn’t the static REL_BUILDER more prone to have
concurrency problems ? And the pushed scans(EMP_SCAN and DEPT_SCAN) are all
nodes(immutable), how could this be a problem ?

Best,
Danny Chan
在 2019年5月29日 +0800 PM5:37,Ruben Q L <rube...@gmail.com>,写道:
I'm checking the commit [1] and I see something strange in
RelOptUtilTest.
Maybe I'm wrong and it is nothing, but just in case it may help:

With the latest modification, it seems that we have two RelBuilder(s) in
place:
- A static one that is created ad-hoc on a static block to generate the
EMP_SCAN and DEPT_SCAN RelNodes [2]
- An instance one to be used in the tests, that is initialized on
the @Before public void setUp() method [3]

Before this commit, the EMP_SCAN / DEPT_SCAN were only used to read their
rowTypes to test some join auxiliary methods. But the new
tests testPushDownJoinConditions* actually build a plan and push these
scans into the RelBuilder to be tested [4] (which is a different one than
the static RelBuider that created the scans).
Maybe this is no problem generally, but it can potentially be under
certain
circumstances?, which would explain the randomness of the issue.
Could this explain the exception?

[1]

https://github.com/apache/calcite/commit/82e7d4e760cb203d31956c55e38e0fdd56119d58

[2]

https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L71
[3]

https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L92
[4]

https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L292



Le mer. 29 mai 2019 à 02:20, Julian Hyde <jh...@apache.org> a écrit :

It’s a tough call. It is probable that the problem existed already and
the
change merely surfaced it.

On May 28, 2019, at 5:17 PM, Stamatis Zampetakis <zabe...@gmail.com>
wrote:

It is not the only test that is failing after commit [1] but all the
new
tests that were added.

I've seen the problem on Jenkins on all JDKS but I cannot reproduce
it
locally.
I guess we have to do with a race condition most likely due to the
concurrent execution of tests with surefire.

Should we revert the commit till we find a solution?

[1]


https://github.com/apache/calcite/commit/82e7d4e760cb203d31956c55e38e0fdd56119d58

On Tue, May 28, 2019 at 7:57 PM Julian Hyde <jh...@apache.org>
wrote:

I have seen this intermittent failure 3 times in the last week:

[INFO] Running org.apache.calcite.plan.RelOptUtilTest
[ERROR] Tests run: 11, Failures: 0, Errors: 1, Skipped: 0, Time
elapsed:
0.411 s <<< FAILURE! - in org.apache.calcite.plan.RelOptUtilTest
[ERROR]


testPushDownJoinConditionsWithExpandedIsNotDistinctUsingCase(org.apache.calcite.plan.RelOptUtilTest)
Time elapsed: 0.349 s <<< ERROR!
org.apache.calcite.rel.metadata.CyclicMetadataException
at


org.apache.calcite.plan.RelOptUtilTest.testPushDownJoinConditionsWithExpandedIsNotDistinctUsingCase(RelOptUtilTest.java:445)

I have seen it on Oracle JDK 12 and OpenJDK 10. The test was only
added
on
May 22 so I assume that it will continue to fail intermittently
until
we do
something.

Anyone have any ideas?

Laurent, As you added the test can you please look into it?

Julian






Reply via email to