Thanks a lot Laurent, much appreciated!

BTW, were you able to reproduce the problem?

On Wed, May 29, 2019 at 11:29 PM Laurent Goujon <laur...@dremio.com> wrote:

> Here's the pull request: https://github.com/apache/calcite/pull/1240
>
> On Wed, May 29, 2019 at 1:58 PM Laurent Goujon <laur...@dremio.com> wrote:
>
> > Looks like most {{RelNode#create()}} access the RelOptCluster/RelBuilder
> > instance from their child, and some then, perform metadata operation,
> which
> > would cause CyclicMetadataException for example. I'll create a fixup
> patch.
> >
> > On Wed, May 29, 2019 at 9:00 AM Laurent Goujon <laur...@dremio.com>
> wrote:
> >
> >> I actually did the change from using a static relbuilder to a mix of
> >> static (to create the scans) and per test because of concurrency issues
> :(
> >> Maybe the test should only been using a per test-case relbuilder
> instead,
> >> but it would mean that there's some code relying on the static
> relbuilder
> >> (by accessing it from the scan rel nodes) in a non-static context. I'll
> try
> >> to debug it too (but so far was unlucky reproducing the jenkins
> problem).
> >>
> >> On Wed, May 29, 2019 at 5:02 AM Francis Chuang <
> francischu...@apache.org>
> >> wrote:
> >>
> >>> In Go, there's a built in race detector that can be used when running
> >>> tests using `go test`.
> >>>
> >>> There's RacerD [1] from Facebook that can detect races in Java. Perhaps
> >>> this is something that can be looked in to to find the race.
> >>>
> >>> [1] https://fbinfer.com/docs/racerd.html
> >>>
> >>> On 29/05/2019 9:52 pm, Stamatis Zampetakis wrote:
> >>> > Good insights, Ruben, Danny!
> >>> >
> >>> > Assuming that is indeed a concurrency problem, it will be difficult
> to
> >>> > identify since many parts in Calcite are not thread safe. Not being
> >>> able to
> >>> > reproduce the problem makes the things even worse.
> >>> >
> >>> > Given that intermittent test failures occur often on Jenkins, how
> about
> >>> > creating a new branch dedicated to debugging?
> >>> > It can be associated with a new Jenkins jobs (that obviously doesn't
> >>> send
> >>> > mails to everybody when there are failures) that can be launched
> >>> on-demand
> >>> > by the person who is looking into the problem.
> >>> >
> >>> > On the other hand, we do not really make an effort to have tests that
> >>> can
> >>> > be executed concurrently so another alternative would be to run tests
> >>> only
> >>> > sequentially.
> >>> > I am not very fan of this approach since it will rather hide problems
> >>> than
> >>> > solve them.
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Wed, May 29, 2019 at 1:02 PM Yuzhao Chen <yuzhao....@gmail.com>
> >>> wrote:
> >>> >
> >>> >> Thanks Ruben for your good analysis.
> >>> >>
> >>> >> What I’m confused is that isn’t the static REL_BUILDER more prone to
> >>> have
> >>> >> concurrency problems ? And the pushed scans(EMP_SCAN and DEPT_SCAN)
> >>> are all
> >>> >> nodes(immutable), how could this be a problem ?
> >>> >>
> >>> >> Best,
> >>> >> Danny Chan
> >>> >> 在 2019年5月29日 +0800 PM5:37,Ruben Q L <rube...@gmail.com>,写道:
> >>> >>> I'm checking the commit [1] and I see something strange in
> >>> >> RelOptUtilTest.
> >>> >>> Maybe I'm wrong and it is nothing, but just in case it may help:
> >>> >>>
> >>> >>> With the latest modification, it seems that we have two
> >>> RelBuilder(s) in
> >>> >>> place:
> >>> >>> - A static one that is created ad-hoc on a static block to generate
> >>> the
> >>> >>> EMP_SCAN and DEPT_SCAN RelNodes [2]
> >>> >>> - An instance one to be used in the tests, that is initialized on
> >>> >>> the @Before public void setUp() method [3]
> >>> >>>
> >>> >>> Before this commit, the EMP_SCAN / DEPT_SCAN were only used to read
> >>> their
> >>> >>> rowTypes to test some join auxiliary methods. But the new
> >>> >>> tests testPushDownJoinConditions* actually build a plan and push
> >>> these
> >>> >>> scans into the RelBuilder to be tested [4] (which is a different
> one
> >>> than
> >>> >>> the static RelBuider that created the scans).
> >>> >>> Maybe this is no problem generally, but it can potentially be under
> >>> >> certain
> >>> >>> circumstances?, which would explain the randomness of the issue.
> >>> >>> Could this explain the exception?
> >>> >>>
> >>> >>> [1]
> >>> >>>
> >>> >>
> >>>
> https://github.com/apache/calcite/commit/82e7d4e760cb203d31956c55e38e0fdd56119d58
> >>> >>>
> >>> >>> [2]
> >>> >>>
> >>> >>
> >>>
> https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L71
> >>> >>> [3]
> >>> >>>
> >>> >>
> >>>
> https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L92
> >>> >>> [4]
> >>> >>>
> >>> >>
> >>>
> https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L292
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> Le mer. 29 mai 2019 à 02:20, Julian Hyde <jh...@apache.org> a
> écrit
> >>> :
> >>> >>>
> >>> >>>> It’s a tough call. It is probable that the problem existed already
> >>> and
> >>> >> the
> >>> >>>> change merely surfaced it.
> >>> >>>>
> >>> >>>>> On May 28, 2019, at 5:17 PM, Stamatis Zampetakis <
> >>> zabe...@gmail.com>
> >>> >>>> wrote:
> >>> >>>>>
> >>> >>>>> It is not the only test that is failing after commit [1] but all
> >>> the
> >>> >> new
> >>> >>>>> tests that were added.
> >>> >>>>>
> >>> >>>>> I've seen the problem on Jenkins on all JDKS but I cannot
> reproduce
> >>> >> it
> >>> >>>>> locally.
> >>> >>>>> I guess we have to do with a race condition most likely due to
> the
> >>> >>>>> concurrent execution of tests with surefire.
> >>> >>>>>
> >>> >>>>> Should we revert the commit till we find a solution?
> >>> >>>>>
> >>> >>>>> [1]
> >>> >>>>>
> >>> >>>>
> >>> >>
> >>>
> https://github.com/apache/calcite/commit/82e7d4e760cb203d31956c55e38e0fdd56119d58
> >>> >>>>>
> >>> >>>>> On Tue, May 28, 2019 at 7:57 PM Julian Hyde <jh...@apache.org>
> >>> >> wrote:
> >>> >>>>>
> >>> >>>>>> I have seen this intermittent failure 3 times in the last week:
> >>> >>>>>>
> >>> >>>>>> [INFO] Running org.apache.calcite.plan.RelOptUtilTest
> >>> >>>>>> [ERROR] Tests run: 11, Failures: 0, Errors: 1, Skipped: 0, Time
> >>> >> elapsed:
> >>> >>>>>> 0.411 s <<< FAILURE! - in org.apache.calcite.plan.RelOptUtilTest
> >>> >>>>>> [ERROR]
> >>> >>>>>>
> >>> >>>>
> >>> >>
> >>>
> testPushDownJoinConditionsWithExpandedIsNotDistinctUsingCase(org.apache.calcite.plan.RelOptUtilTest)
> >>> >>>>>> Time elapsed: 0.349 s <<< ERROR!
> >>> >>>>>> org.apache.calcite.rel.metadata.CyclicMetadataException
> >>> >>>>>> at
> >>> >>>>>>
> >>> >>>>
> >>> >>
> >>>
> org.apache.calcite.plan.RelOptUtilTest.testPushDownJoinConditionsWithExpandedIsNotDistinctUsingCase(RelOptUtilTest.java:445)
> >>> >>>>>>
> >>> >>>>>> I have seen it on Oracle JDK 12 and OpenJDK 10. The test was
> only
> >>> >> added
> >>> >>>> on
> >>> >>>>>> May 22 so I assume that it will continue to fail intermittently
> >>> >> until
> >>> >>>> we do
> >>> >>>>>> something.
> >>> >>>>>>
> >>> >>>>>> Anyone have any ideas?
> >>> >>>>>>
> >>> >>>>>> Laurent, As you added the test can you please look into it?
> >>> >>>>>>
> >>> >>>>>> Julian
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>
> >>> >>>>
> >>> >>
> >>> >
> >>>
> >>
>

Reply via email to