Here's the pull request: https://github.com/apache/calcite/pull/1240

On Wed, May 29, 2019 at 1:58 PM Laurent Goujon <laur...@dremio.com> wrote:

> Looks like most {{RelNode#create()}} access the RelOptCluster/RelBuilder
> instance from their child, and some then, perform metadata operation, which
> would cause CyclicMetadataException for example. I'll create a fixup patch.
>
> On Wed, May 29, 2019 at 9:00 AM Laurent Goujon <laur...@dremio.com> wrote:
>
>> I actually did the change from using a static relbuilder to a mix of
>> static (to create the scans) and per test because of concurrency issues :(
>> Maybe the test should only been using a per test-case relbuilder instead,
>> but it would mean that there's some code relying on the static relbuilder
>> (by accessing it from the scan rel nodes) in a non-static context. I'll try
>> to debug it too (but so far was unlucky reproducing the jenkins problem).
>>
>> On Wed, May 29, 2019 at 5:02 AM Francis Chuang <francischu...@apache.org>
>> wrote:
>>
>>> In Go, there's a built in race detector that can be used when running
>>> tests using `go test`.
>>>
>>> There's RacerD [1] from Facebook that can detect races in Java. Perhaps
>>> this is something that can be looked in to to find the race.
>>>
>>> [1] https://fbinfer.com/docs/racerd.html
>>>
>>> On 29/05/2019 9:52 pm, Stamatis Zampetakis wrote:
>>> > Good insights, Ruben, Danny!
>>> >
>>> > Assuming that is indeed a concurrency problem, it will be difficult to
>>> > identify since many parts in Calcite are not thread safe. Not being
>>> able to
>>> > reproduce the problem makes the things even worse.
>>> >
>>> > Given that intermittent test failures occur often on Jenkins, how about
>>> > creating a new branch dedicated to debugging?
>>> > It can be associated with a new Jenkins jobs (that obviously doesn't
>>> send
>>> > mails to everybody when there are failures) that can be launched
>>> on-demand
>>> > by the person who is looking into the problem.
>>> >
>>> > On the other hand, we do not really make an effort to have tests that
>>> can
>>> > be executed concurrently so another alternative would be to run tests
>>> only
>>> > sequentially.
>>> > I am not very fan of this approach since it will rather hide problems
>>> than
>>> > solve them.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Wed, May 29, 2019 at 1:02 PM Yuzhao Chen <yuzhao....@gmail.com>
>>> wrote:
>>> >
>>> >> Thanks Ruben for your good analysis.
>>> >>
>>> >> What I’m confused is that isn’t the static REL_BUILDER more prone to
>>> have
>>> >> concurrency problems ? And the pushed scans(EMP_SCAN and DEPT_SCAN)
>>> are all
>>> >> nodes(immutable), how could this be a problem ?
>>> >>
>>> >> Best,
>>> >> Danny Chan
>>> >> 在 2019年5月29日 +0800 PM5:37,Ruben Q L <rube...@gmail.com>,写道:
>>> >>> I'm checking the commit [1] and I see something strange in
>>> >> RelOptUtilTest.
>>> >>> Maybe I'm wrong and it is nothing, but just in case it may help:
>>> >>>
>>> >>> With the latest modification, it seems that we have two
>>> RelBuilder(s) in
>>> >>> place:
>>> >>> - A static one that is created ad-hoc on a static block to generate
>>> the
>>> >>> EMP_SCAN and DEPT_SCAN RelNodes [2]
>>> >>> - An instance one to be used in the tests, that is initialized on
>>> >>> the @Before public void setUp() method [3]
>>> >>>
>>> >>> Before this commit, the EMP_SCAN / DEPT_SCAN were only used to read
>>> their
>>> >>> rowTypes to test some join auxiliary methods. But the new
>>> >>> tests testPushDownJoinConditions* actually build a plan and push
>>> these
>>> >>> scans into the RelBuilder to be tested [4] (which is a different one
>>> than
>>> >>> the static RelBuider that created the scans).
>>> >>> Maybe this is no problem generally, but it can potentially be under
>>> >> certain
>>> >>> circumstances?, which would explain the randomness of the issue.
>>> >>> Could this explain the exception?
>>> >>>
>>> >>> [1]
>>> >>>
>>> >>
>>> https://github.com/apache/calcite/commit/82e7d4e760cb203d31956c55e38e0fdd56119d58
>>> >>>
>>> >>> [2]
>>> >>>
>>> >>
>>> https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L71
>>> >>> [3]
>>> >>>
>>> >>
>>> https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L92
>>> >>> [4]
>>> >>>
>>> >>
>>> https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L292
>>> >>>
>>> >>>
>>> >>>
>>> >>> Le mer. 29 mai 2019 à 02:20, Julian Hyde <jh...@apache.org> a écrit
>>> :
>>> >>>
>>> >>>> It’s a tough call. It is probable that the problem existed already
>>> and
>>> >> the
>>> >>>> change merely surfaced it.
>>> >>>>
>>> >>>>> On May 28, 2019, at 5:17 PM, Stamatis Zampetakis <
>>> zabe...@gmail.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> It is not the only test that is failing after commit [1] but all
>>> the
>>> >> new
>>> >>>>> tests that were added.
>>> >>>>>
>>> >>>>> I've seen the problem on Jenkins on all JDKS but I cannot reproduce
>>> >> it
>>> >>>>> locally.
>>> >>>>> I guess we have to do with a race condition most likely due to the
>>> >>>>> concurrent execution of tests with surefire.
>>> >>>>>
>>> >>>>> Should we revert the commit till we find a solution?
>>> >>>>>
>>> >>>>> [1]
>>> >>>>>
>>> >>>>
>>> >>
>>> https://github.com/apache/calcite/commit/82e7d4e760cb203d31956c55e38e0fdd56119d58
>>> >>>>>
>>> >>>>> On Tue, May 28, 2019 at 7:57 PM Julian Hyde <jh...@apache.org>
>>> >> wrote:
>>> >>>>>
>>> >>>>>> I have seen this intermittent failure 3 times in the last week:
>>> >>>>>>
>>> >>>>>> [INFO] Running org.apache.calcite.plan.RelOptUtilTest
>>> >>>>>> [ERROR] Tests run: 11, Failures: 0, Errors: 1, Skipped: 0, Time
>>> >> elapsed:
>>> >>>>>> 0.411 s <<< FAILURE! - in org.apache.calcite.plan.RelOptUtilTest
>>> >>>>>> [ERROR]
>>> >>>>>>
>>> >>>>
>>> >>
>>> testPushDownJoinConditionsWithExpandedIsNotDistinctUsingCase(org.apache.calcite.plan.RelOptUtilTest)
>>> >>>>>> Time elapsed: 0.349 s <<< ERROR!
>>> >>>>>> org.apache.calcite.rel.metadata.CyclicMetadataException
>>> >>>>>> at
>>> >>>>>>
>>> >>>>
>>> >>
>>> org.apache.calcite.plan.RelOptUtilTest.testPushDownJoinConditionsWithExpandedIsNotDistinctUsingCase(RelOptUtilTest.java:445)
>>> >>>>>>
>>> >>>>>> I have seen it on Oracle JDK 12 and OpenJDK 10. The test was only
>>> >> added
>>> >>>> on
>>> >>>>>> May 22 so I assume that it will continue to fail intermittently
>>> >> until
>>> >>>> we do
>>> >>>>>> something.
>>> >>>>>>
>>> >>>>>> Anyone have any ideas?
>>> >>>>>>
>>> >>>>>> Laurent, As you added the test can you please look into it?
>>> >>>>>>
>>> >>>>>> Julian
>>> >>>>>>
>>> >>>>>>
>>> >>>>
>>> >>>>
>>> >>
>>> >
>>>
>>

Reply via email to