> On Feb. 26, 2015, 9:56 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/test/java/org/apache/drill/exec/TestWithZookeeper.java, 
> > line 31
> > <https://reviews.apache.org/r/31435/diff/1/?file=876411#file876411line31>
> >
> >     why commented?  I see no downside to having an not-yet used logger in a 
> > class.

It produces a warning. I'm trying to eliminate all warnings. It's easy enough 
to uncomment it the first time someone wants to use a logger in this file.


> On Feb. 26, 2015, 9:56 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java, line 61
> > <https://reviews.apache.org/r/31435/diff/1/?file=876404#file876404line61>
> >
> >     This is a stylistic change.  I think you need to propose this on the 
> > list and we come to a group consensus.  As it is, this is inconsistent with 
> > the whole rest of the code base.  Please discuss on the dev list before 
> > changing this everywhere.

Done. It looks like the ayes have it.


> On Feb. 26, 2015, 9:56 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryManager.java,
> >  line 313
> > <https://reviews.apache.org/r/31435/diff/1/?file=876399#file876399line313>
> >
> >     No, but that is on purpose.  We keep them forever for log analysis

Once this code is put into production, and a lot of queries are issued 
constantly, this is going to grow quickly. We need to have a plan for the 
future to manage this. Submitted 
https://issues.apache.org/jira/browse/DRILL-2362 .


> On Feb. 26, 2015, 9:56 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java, 
> > line 215
> > <https://reviews.apache.org/r/31435/diff/1/?file=876384#file876384line215>
> >
> >     Did you reorder for a sepcific reason?  Ideally we should unregister 
> > from the coordination service before shutting things down.  In fact, we 
> > should probably also pause before shutting things down after unregistering

I checked master, and the order has not changed. And we do unregister first, at 
the top of close(). We also pause after unregistering and before shutting 
things down. I see no change from master, other than the introduction of the 
new AutoCloseables.close() instead of try-catch blocks.


> On Feb. 26, 2015, 9:56 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java,
> >  line 867
> > <https://reviews.apache.org/r/31435/diff/1/?file=876396#file876396line867>
> >
> >     Shouldn't this be protected?

FragmentSubmitFailures doesn't have any derived classes, so I don't see how 
that helps. It is used to communicate between the various 
FragmentSubmitListeners that are created, and setupNonRootFragments(), so they 
all need to see it. We could make it private, but then we just have to add a 
getter and an incrementer, so protection doesn't really add anything.


> On Feb. 26, 2015, 9:56 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java, 
> > line 343
> > <https://reviews.apache.org/r/31435/diff/1/?file=876394#file876394line343>
> >
> >     Do you make sure that Foremans no longer leak exceptions?  This was 
> > here to ensure that we get an error rather than the thread terminating and 
> > the error showing up in standard err.

Since we catch Exception in Foreman.run(), it looks like you can't get out 
without reporting the error (as the Foreman moves to the FAILURE state).


> On Feb. 26, 2015, 9:56 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java,
> >  line 223
> > <https://reviews.apache.org/r/31435/diff/1/?file=876396#file876396line223>
> >
> >     is this guaranteed to be called or can something above prematurely end 
> > this method via exception?

The listener deregistrations are fine, because they're just list removals. But 
to be safe, I added a try-finally around the response send, just in case that 
takes some kind of IO or RPC exception.


> On Feb. 26, 2015, 9:56 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java,
> >  line 659
> > <https://reviews.apache.org/r/31435/diff/1/?file=876396#file876396line659>
> >
> >     Canceled should be a terminal state from the user's perspective.  Not 
> > sure that happens here or if you manage this some other way.

Any notion of "cancelled" that the user sees is independent of this. And that 
doesn't appear to be available at on the client at this time -- this was why I 
was unable to complete any automated testing. I discovered when I tried to use 
it that QueryState never gets back to the client, and none of our tests seem to 
check it. They only look for last chunk, or some expected number of rows. I 
expect to do some work on this in DRILL-1802, and have added a TODO(DRILL-1802) 
to the code in the state machine.

Meanwhile, the code wasn't consistent wrt the principle you're proposing, 
namely, that CANCELLED is a terminal state. Some code treated it that way, and 
some didn't. The relevant issue is that Foreman.cancel() immediately effects a 
state transition to CANCELLED, and then sends cancellation requests to all 
remote fragments. When those all acknowledge this, then Foreman moves to 
COMPLETED, at which point it may clean up resources associated with the query. 
There were risks with the previous inconsistent handling, which immediately 
cleaned up, because then cancellation acknowledgements would arrive and be 
delivered to apparently broken state (via QueryManager.statusUpdate() inherited 
from FragmentStatusListener) because resources may or may not have been cleaned 
up (depending on timing), possibly causing more problems. Submitted DRILL-2370 
for this, and added a TODO(DRILL-2370) in QueryManager.statusUpdate(), even 
though this is not likely to be the site of the work that needs to be don
 e.


> On Feb. 26, 2015, 9:56 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java,
> >  line 132
> > <https://reviews.apache.org/r/31435/diff/1/?file=876401#file876401line132>
> >
> >     We need to make sure we have everything closed before returning 
> > success.  You've removed that functionality by eating (only logging) the 
> > exceptions.  If a query leaks anything, I want the query to fail so it 
> > shows up in all the tests.  I'm okay disabling this functionality in 
> > production mode but definitely want it to be the case in normal development 
> > and testing.  If you disagree, we should discuss further because I see this 
> > a show stopper for this patch.

It's odd that you marked this line, when that was the *only* one that threw 
exceptions if there were problems cleaning up. There were a few other calls to 
closeOutResources(false), which wouldn't report problems. But if you want to 
know "If a query leaks anything,...." we should do that consistently, even for 
the failure cases. Otherwise, if queries fail, but silently leak resources, 
then the server eventually runs out of resources anyway.

I think it's great to report such things, but we should do them always or 
never. Not just on success. Reporting during test and development is fine. I've 
altered closeOutResources() so that it uses AssertionUtil.IsAssertionsEnabled() 
to determine whether we should report or not, rather than leaving it up to the 
caller.


- Chris


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31435/#review74282
-----------------------------------------------------------


On March 3, 2015, 4:21 p.m., Chris Westin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31435/
> -----------------------------------------------------------
> 
> (Updated March 3, 2015, 4:21 p.m.)
> 
> 
> Review request for drill and Jacques Nadeau.
> 
> 
> Bugs: DRILL-2245
>     https://issues.apache.org/jira/browse/DRILL-2245
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> DRILL-2245-core: Clean up query setup and execution kickoff in 
> Foreman/WorkManager in order to ensure consistent handling, and avoid hangs 
> and races, with the goal of improving Drillbit robustness.
> 
>     I did my best to keep these clean when I split them up, but this core 
> commit
>     may depend on some minor changes in the hygiene commit that is also
>     associated with this bug, so either both should be applied, or neither.
>     The core commit should be applied first.
> 
>     AutoCloseables
>     - created org.apache.drill.common.AutoCloseables to handle closing these
>       quietly
> 
>     BaseTestQuery, and derivatives
>     - factored out pieces into QueryTestUtil so they can be reused
> 
>     Drillbit
>     - uses AutoCloseables for the WorkManager and for the storeProvider
>     - allow start() to take a RemoteServiceSet
>     - private, final, formatting
> 
>     Foreman
>        - does not need to implement Comparable
>     - does not need to implement Closeable
>     - thread blocking fixes
>     - add resultSent flag
>     - add code to log plan fragments with endpoint assignments
>     - added finals, cleaned up formatting
>     - do queue management in acquireQuerySemaphore; local tests pass
>     - rename getContext() to getQueryContext()
>     - retain DrillbitContext
>     - a couple of exception injections for testing
>     - minor formatting
>     - TODOs
> 
>     FragmentExecutor
>     - eliminated CancelableQuery
>     - common subexpression elimination
>     - cleaned up
> 
>     QueryContext
>     - removed unnecessary functions (with some outside classes tweaked for 
> this)
>     - finals, formatting
> 
>     QueryManager
>     - merge in QueryStatus
>       - affects Foreman, ../batch/ControlHandlerImpl,
>         and ../../server/rest/ProfileResources
>     - made some methods private
>     - removed unused imports
>     - add finals and formatting
>     - variable renaming to improve readability
>     - formatting
>     - comments
>     - TODOs
> 
>     QueryStatus
>     - getAsInfo() private
>     - member renaming
>     - member access changes
>     - formatting
>     - TODOs
> 
>     QueryTestUtil, BaseTestQuery, TestDrillbitResilience
>     - make maxWidth a parameter to server startup
> 
>     SelfCleaningRunnable
>     - created org.apache.drill.common.SelfCleaningRunnable
> 
>     SingleRowListener
>     - created org.apache.drill.SingleRowListener results listener
>     - use in TestDrillbitResilience
>     
>     TestDrillbitResilience
>     - created org.apache.drill.exec.server.TestDrillbitResilience to test 
> drillb
>       resilience in the face of exceptions and failures during queries
> 
>     TestWithZookeeper
>     - factor out work into ZookeeperHelper so that it can be reused by
>       TestDrillbitResilience
> 
>     UserBitShared
>     - get rid of unused UNKNOWN_QUERY
> 
>     WorkEventBus
>     - rename methods, affects Foreman and ControlHandlerImpl
>     - remove unused WorkerBee reference
>     - most members final
>     - formatting
> 
>     WorkManager
>     - Closeable to AutoCloseable
>     - removed unused incomingFragments Set
>     - eliminated unnecessary eventThread and pendingTasks by posting Runnables
>       directly to executor
>     - use SelfCleaningRunnable for Foreman management
>     - FragmentExecutor management uses SelfCleaningRunnable
>     - runningFragments to be a ConcurrentHashMap; TestTpchDistributed passes
>     - other improvements due to bee no longer needed in various places
>     - most members final
>     - minor formatting
>     - comments
>     - TODOs
> 
>     (*) Created exception injection classes to simulate exceptions for testing
>     - ExceptionInjection
>     - ExceptionInjector
>     - ExceptionInjectionUtil
>     - TestExceptionInjection
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/drill/common/AutoCloseables.java 
> PRE-CREATION 
>   common/src/main/java/org/apache/drill/common/SelfCleaningRunnable.java 
> PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 5efcce8 
>   exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java 
> 04b955b 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/coord/local/LocalClusterCoordinator.java
>  035c1aa 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/rpc/control/WorkEventBus.java
>  d6b8637 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java 
> 67342c4 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/DrillbitContext.java
>  83a89df 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  aa0a5ad 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java
>  ae04bad 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/testing/ExceptionInjection.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/testing/ExceptionInjector.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/testing/InjectionSite.java 
> PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/testing/SimulatedExceptions.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/CancelableQuery.java 
> 5b11943 
>   exec/java-exec/src/main/java/org/apache/drill/exec/work/StatusProvider.java 
> 6086f74 
>   exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java 
> 99c6ab8 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/batch/ControlHandlerImpl.java
>  3228da9 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java 
> 378e81a 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/FragmentData.java
>  52fd0a9 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/FragmentStatusListener.java
>  6a719d2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryManager.java
>  2de3592 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStatus.java
>  4e18da6 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java
>  7ccb64e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/NonRootFragmentManager.java
>  3671804 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/RootFragmentManager.java
>  54fc8c4 
>   exec/java-exec/src/test/java/org/apache/drill/BaseTestQuery.java cf99577 
>   exec/java-exec/src/test/java/org/apache/drill/PlanTestBase.java c52545d 
>   exec/java-exec/src/test/java/org/apache/drill/QueryTestUtil.java 
> PRE-CREATION 
>   exec/java-exec/src/test/java/org/apache/drill/SingleRowListener.java 
> PRE-CREATION 
>   exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java 51f3121 
>   exec/java-exec/src/test/java/org/apache/drill/TestBuilder.java 978e565 
>   exec/java-exec/src/test/java/org/apache/drill/exec/DrillSystemTestBase.java 
> 75ba3a9 
>   exec/java-exec/src/test/java/org/apache/drill/exec/TestWithZookeeper.java 
> bb69c9a 
>   exec/java-exec/src/test/java/org/apache/drill/exec/ZookeeperHelper.java 
> PRE-CREATION 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestOptiqPlans.java
>  4aaaa78 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/server/TestDrillbitResilience.java
>  PRE-CREATION 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/testing/ExceptionInjectionUtil.java
>  PRE-CREATION 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/testing/TestExceptionInjection.java
>  PRE-CREATION 
>   protocol/src/main/protobuf/UserBitShared.proto 8f05c45 
> 
> Diff: https://reviews.apache.org/r/31435/diff/
> 
> 
> Testing
> -------
> 
> mvn install
> Functional - Passing - New
> Advanced - TPCH SF100 - Parquet
> 
> 
> Thanks,
> 
> Chris Westin
> 
>

Reply via email to