The issue with the failing test in TestDrillbitResilience.
cancelAfterAllResultsProduced is similar to DRILL-3967 (
TestDrillbitResilience.cancelAfterEverythingIsCompleted failure).

In both this test case, and DRILL-3967, the query is paused (in different
places) and a cancel is sent. The query is then resumed and the resulting
state is checked. The problem is that the tests have a race condition
between the cancellation and the resuming of the query. Sometimes the
resume reaches first and sometimes the cancel reaches first. The failure
described by Volodymyr is caused by this race condition. I don't know why
the test was done like this, but this is an existing problem and shouldn't
hold up the release.

However, I also see a failure where we encounter an illegal state
transition (the query state is CANCELLATION_REQUESTED and the Foreman tries
to move to an ENQUEUED state). This happens once in about twenty-five
executions. The Foreman only requests ENQUEUE once, when the query is about
to start, so this means the cancellation request reached before the query
start request. How this happens in the unit test I have not been able to
determine yet (because it really shouldn't be possible).

To recreate the problem I simply added a repeat rule in the class and set a
repeat count of 1000. The problem occurs easily if the test is run from the
command line. When running in debug, I was unable to see the problem.

I'll spend some more time on this, but just in case someone wants to
investigate further, feel free ...


On Thu, Mar 8, 2018 at 10:59 AM, Parth Chandra <par...@apache.org> wrote:

> Not sure if that would work. The release build does not allow uncommitted
> files, so I have to commit pom.xml changes to at least the local repo,
> which will get pushed to my public repo when the release is done. Not
> committing this to Apache master would be cheating would leave us with a
> build that does not match any source in Apache master? Javadoc generated is
> never committed to any repo. It is part of the src release jars, AFAIK.
>
> Also, I'm not sure where in the pom Jyothsna made the change; I added the
> -Dxoclint:none to the build section of the apache-release profile and java
> exec still gives over 100 javadoc errors.
>
> We have to fix these one of these days. Might as well do it now. Knowing
> how it works, if we don't fix these now, someone will be scrambling to fix
> these just before the 1.14.0 release :(
>
>
>
> On Thu, Mar 8, 2018 at 10:06 AM, Aman Sinha <amansi...@apache.org> wrote:
>
>> Parth,  would it work if you made the pom.xml changes locally in your
>> branch, generated the javadoc but only commit the javadoc jar files to the
>> release branch, not the pom.xml changes ?
>> Anyone downloading Drill source code to build should not run into this
>> since typically they won't be building javadoc.
>>
>> -Aman
>>
>> On Wed, Mar 7, 2018 at 6:37 PM, Parth Chandra <par...@apache.org> wrote:
>>
>> > Unfortunately, we cannot do that since we also want to be able to build
>> > with JDK 7 for at least a couple of releases to allow for a reasonable
>> > transition time.  doclint was introduced in JDK 8 so JDK 7 fails
>> because it
>> > doesn't recognize the parameter.
>> >
>> >
>> >
>> > On Thu, Mar 8, 2018 at 7:03 AM, Jyothsna Reddy <jyothsna....@gmail.com>
>> > wrote:
>> >
>> > > Regarding DRILL-4547, I used Vladimir's branch(DRILL-1491) and added
>> > > following lines to pom.xml to disable doc lint. The javadoc doesn't
>> throw
>> > > any errors and the build is successful.
>> > >
>> > >     <activation>
>> > >
>> > >         <jdk>[1.8,)</jdk>
>> > >
>> > >       </activation>
>> > >
>> > >       <properties>
>> > >
>> > >         <additionalparam>-Xdoclint:none</additionalparam>
>> > >
>> > >       </properties>
>> > >
>> > >
>> > >
>> > > ‌
>> > >
>> > > On Wed, Mar 7, 2018 at 3:08 PM, Hanumath Rao Maduri <
>> hanu....@gmail.com>
>> > > wrote:
>> > >
>> > > > On my machine I couldn't repro the issue related to
>> > > TestDrillbitResilience.
>> > > > cancelAfterAllResultsProduced.
>> > > > I used the vladimir's branch (i.e DRILL-1491).
>> > > > Used the maven test command for testing it.
>> > > >
>> > > > output of the test run.
>> > > > ... 4 common frames omitted
>> > > > Tests run: 20, Failures: 0, Errors: 0, Skipped: 6, Time elapsed:
>> > 124.187
>> > > > sec - in org.apache.drill.exec.server.TestDrillbitResilience
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Mar 7, 2018 at 11:00 AM, Parth Chandra <par...@apache.org>
>> > > wrote:
>> > > >
>> > > > > Yes I agree. JDBC would be a new feature that we can defer to
>> 1.14.0.
>> > > > > I'm hoping we can resolve the other three in the next few days.
>> > Target
>> > > > date
>> > > > > for starting release process - Friday Mar 9th
>> > > > >
>> > > > > Once these are resolved, I will create a branch for the release so
>> > that
>> > > > > Apache master remains open for commits. If any issues are found in
>> > the
>> > > > > release branch, we will fix them in master and I will cherry-pick
>> the
>> > > > into
>> > > > > the release branch. Once the release is finalized I will add a
>> > release
>> > > > tag
>> > > > > and  remove the branch.
>> > > > >
>> > > > > Also note if QA folks want to get started on testing the release,
>> the
>> > > > > current head of Apache master is close to final. Javadoc
>> generation
>> > is
>> > > > only
>> > > > >  a release build issue, and the other issues are localized to
>> > specific
>> > > > > cases.
>> > > > >
>> > > > > Note: to reproduce the javadoc issues:
>> > > > >    # set JAVA_HOME to JDK 8
>> > > > >    mvn javadoc:javadoc -Papache-release
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, Mar 7, 2018 at 11:23 PM, Aman Sinha <amansi...@apache.org
>> >
>> > > > wrote:
>> > > > >
>> > > > > > It seems to me the main blockers are:
>> > > > > >
>> > > > > > 1. DRILL-4547    Javadoc fails with Java8   <-- Can we split up
>> the
>> > > > work
>> > > > > > among few people to resolve these ?
>> > > > > > 2. DRILL-6216    Metadata mismatch..         <-- Agreement was
>> to
>> > > > revert
>> > > > > > one small piece of code and it appears Sorabh is looking into it
>> > > > > > 3. TestDrillbitResilience.cancelAfterAllResultsProduced  <--
>> need
>> > > > > someone
>> > > > > > to look into this
>> > > > > >
>> > > > > > Regarding the JDBC issues that Parth mentioned, looking at the
>> > JIRAs,
>> > > > it
>> > > > > > seems they are not showstoppers...Parth do you agree ?
>> > > > > >
>> > > > > > Since we are close to the finish line for JDK 8, IMO we should
>> try
>> > > and
>> > > > > see
>> > > > > > if in another day or two we can get over these hurdles.
>> > > > > >
>> > > > > > -Aman
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Wed, Mar 7, 2018 at 7:17 AM, Pritesh Maker <pma...@mapr.com>
>> > > wrote:
>> > > > > >
>> > > > > > > The JDK 8 issues will likely require more time to harden for
>> it
>> > to
>> > > be
>> > > > > > > included in the 1.13 release. My recommendation would be to
>> move
>> > > > ahead
>> > > > > > with
>> > > > > > > the 1.13 release now and address these issues right.
>> > > > > > >
>> > > > > > > Pritesh
>> > > > > > >
>> > > > > > > -----Original Message-----
>> > > > > > > From: Parth Chandra <par...@apache.org>
>> > > > > > > Sent: March 7, 2018 3:34 AM
>> > > > > > > To: dev <dev@drill.apache.org>
>> > > > > > > Subject: Re: [DISCUSS] 1.13.0 release
>> > > > > > >
>> > > > > > > My mistake Volodymyr.
>> > > > > > >
>> > > > > > > Found some other JDK 8 issues in JIRA not tracked in
>> DRILL-1491
>> > > > > > >
>> > > > > > >   DRILL-4547    Javadoc fails with Java8
>> > > > > > >   DRILL-6163    Switch Travis To Java 8
>> > > > > > >
>> > > > > > > The following are tracked in DRILL-1491, but it doesn't look
>> like
>> > > > we're
>> > > > > > > addressing these. Are we?
>> > > > > > >
>> > > > > > >   DRILL-4329 13 Unit tests are failing with JDK 8
>> > > > > > >   DRILL-4333    DRILL-4329 tests in
>> > > > > > > Drill2489CallsAfterCloseThrowExceptionsTest fail in Java 8
>> > > > > > >   DRILL-5120    Upgrade JDBC Driver for new Java 8 methods
>> > > > > > >   DRILL-5680    BasicPhysicalOpUnitTest can't run in Eclipse
>> with
>> > > > Java
>> > > > > 8
>> > > > > > >
>> > > > > > >
>> > > > > > > *DRILL-4547 is a showstopper*. The release build
>> > (-Papache-release)
>> > > > > fails
>> > > > > > > with far too many Javadoc errors even with doc lint turned
>> off.
>> > > > > > >
>> > > > > > > DRILL-4333, DRILL-4329, DRILL-5120 are JDBC related which is a
>> > > > project
>> > > > > by
>> > > > > > > itself.
>> > > > > > >
>> > > > > > > Note that fixing JDBC related issues and adding the command
>> line
>> > > > option
>> > > > > > to
>> > > > > > > turn doc lint off will likely break Java 7 builds.
>> > > > > > >
>> > > > > > >
>> > > > > > > Folks who voted to get JDK 8 into this release, what is the
>> > > consensus
>> > > > > on
>> > > > > > > JDBC/Java8 ?
>> > > > > > > Also, any volunteers on helping debug
>> > > > > > > TestDrillbitResilience.cancelAfterAllResultsProduced
>> > > > > > > ?
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, Mar 7, 2018 at 3:20 PM, Volodymyr Tkach <
>> > > > vovatkac...@gmail.com
>> > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Addition to my last message:
>> > > > > > > > The link with PR for DRILL-1491
>> https://urldefense.proofpoint
>> > .
>> > > > > > > com/v2/url?u=https-3A__github.com_apache_drill_pull_1143&d=
>> > > DwIBaQ&c=
>> > > > > > > cskdkSMqhcnjZxdQVpwTXg&r=zySISmkmM4WNViCKijENtQ&m=
>> > > oTnKwfjj5hFBosMrq_
>> > > > > > > WWhazhGeoC2nGSKeMOPxU2_cM&s=p3uialdRhgnf3XRY22R4SWXGZIq66a
>> > > > > > Pijuy-Ms0J_-4&e=
>> > > > > > > > on which the we can see  TestDrillbitResilience.
>> > > > > > > > cancelAfterAllResultsProduced
>> > > > > > > > failure.
>> > > > > > > >
>> > > > > > > > 2018-03-07 11:45 GMT+02:00 Volodymyr Tkach <
>> > > vovatkac...@gmail.com
>> > > > >:
>> > > > > > > >
>> > > > > > > > > *To Parth:*
>> > > > > > > > > The failure can only be seen if run on DRILL-1491 branch,
>> > > because
>> > > > > it
>> > > > > > > uses
>> > > > > > > > > jdk 1.8 in pom.xml
>> > > > > > > > >
>> > > > > > > > > <source>1.8</source>
>> > > > > > > > > <target>1.8</target>
>> > > > > > > > >
>> > > > > > > > > 2018-03-07 6:03 GMT+02:00 Sorabh Hamirwasia <
>> > > > shamirwa...@mapr.com
>> > > > > >:
>> > > > > > > > >
>> > > > > > > > >> Just sent an email on RCA of DRILL-6216 to discuss next
>> > steps.
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> Thanks,
>> > > > > > > > >> Sorabh
>> > > > > > > > >>
>> > > > > > > > >> ________________________________
>> > > > > > > > >> From: Parth Chandra <par...@apache.org>
>> > > > > > > > >> Sent: Tuesday, March 6, 2018 6:48:21 PM
>> > > > > > > > >> To: dev
>> > > > > > > > >> Subject: Re: [DISCUSS] 1.13.0 release
>> > > > > > > > >>
>> > > > > > > > >> We have two items remaining -
>> > > > > > > > >>
>> > > > > > > > >> DRILL-1491 - Ideally, I would like to make sure that
>> CANCEL
>> > is
>> > > > > > handled
>> > > > > > > > >> correctly with JDK 8. If the failure of the unit test is
>> > > because
>> > > > > the
>> > > > > > > > >> cancel
>> > > > > > > > >> is received after the query is completed, then the issue
>> is
>> > > less
>> > > > > > > severe,
>> > > > > > > > >> but I would like to be sure that this is the case.
>> > > > > > > > >> Are there others who see the DrillbitResilience tests
>> > failing
>> > > > for
>> > > > > > > them?
>> > > > > > > > >> Can
>> > > > > > > > >> we try to assist Volodymyr? I don't see the failures
>> myself.
>> > > > > > > > >>
>> > > > > > > > >> DRILL-6216 - this is a showstopper.
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> On Wed, Mar 7, 2018 at 5:27 AM, Kunal Khatua <
>> > > > > kunalkha...@gmail.com
>> > > > > > >
>> > > > > > > > >> wrote:
>> > > > > > > > >>
>> > > > > > > > >> > Hi Parth
>> > > > > > > > >> >
>> > > > > > > > >> > DRILL-6216 is a release blocker that is being currently
>> > > looked
>> > > > > > into.
>> > > > > > > > >> >
>> > > > > > > > >> > Ref:
>> > > > > > > > >> > DRILL-6216: Metadata mismatch when connecting to a
>> Drill
>> > > > 1.12.0
>> > > > > > > with a
>> > > > > > > > >> > Drill-1.13.0-SNAPSHOT driver
>> > > > > > > > >> > https://urldefense.proofpoint.
>> > com/v2/url?u=https-3A__issues
>> > > .
>> > > > > > > > >> apache.org_jira_browse_DRILL-2D6216&d=DwIBaQ&c=
>> > cskdkSMqhcnjZ
>> > > > > > > > >> xdQVpwTXg&r=gRpEl0WzXE3EMrwj0KFbZXGXRyadOt
>> > hF2jlYxvhTlQg&m=xu
>> > > > > > > > >> Rz02Sbprxvbtw1OrBuDvlRbp2lh9mz
>> > 3sxpP5-wHPs&s=txeKaKzF67flAi48
>> > > > > > > > >> DUNLgMWbxje1GXWxfFpG6BEPXk0&e=
>> > > > > > > > >> >
>> > > > > > > > >> > Please add it to the list of required commits as well.
>> > > > > > > > >> >
>> > > > > > > > >> > Thanks
>> > > > > > > > >> > ~ Kunal
>> > > > > > > > >> > On 3/6/2018 9:53:06 AM, Volodymyr Tkach <
>> > > > vovatkac...@gmail.com>
>> > > > > > > > wrote:
>> > > > > > > > >> > Right now i haven't found the reason of
>> > > > > > > > >> > TestDrillbitResilience.cancelAfterAllResultsProduced
>> > > failure,
>> > > > > > most
>> > > > > > > > >> likely
>> > > > > > > > >> > the cause of the failure is that the query is able to
>> have
>> > > > been
>> > > > > > > > >> completed
>> > > > > > > > >> > before cancellation request is processed.
>> > > > > > > > >> > This test not only the case, there is one more ignored
>> > test
>> > > > > > > > >> > TestDrillbitResilience.cancelA
>> fterEverythingIsCompleted
>> > and
>> > > > > jira
>> > > > > > > > >> > DRILL-3967
>> > > > > > > > >> > created, although the environment is AWS.
>> > > > > > > > >> >
>> > > > > > > > >> > Maybe it makes sense to ignore this test to unblock the
>> > > > release
>> > > > > > and
>> > > > > > > > >> merge
>> > > > > > > > >> > JDK8 changes?
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to