Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13352 )

Change subject: IMPALA-8473: publish lineage info via hook
......................................................................


Patch Set 14:

(7 comments)

Thanks Radford for working through these issues! It is worth getting this stuff 
right.

What is the quality of service being provided by the Hook Manager.  Do we 
guarantee that we will call hooks? If a query ends at time t, is there any 
guarantee of when we will call the hook (like with 10 seconds of t)? What do we 
do if hook execution hangs forever?
If we guarantee delivery in a timely fashion then at some point failing to 
execute hooks in a timely fashion should be detected and something dramatic 
should happen like queries being failed. We probably don't want to fail by 
getting OutOfMemory.
If we don't guarantee delivery in a timely fashion then at some point we should 
stop queuing work and start throwing it away.

If things do go wrong we want to be able to diagnose what is happening. Logging 
is good but ideally we would also expose hook execution metrics in the webui.

http://gerrit.cloudera.org:8080/#/c/13352/14/fe/src/main/java/org/apache/impala/hooks/QueryExecHook.java
File fe/src/main/java/org/apache/impala/hooks/QueryExecHook.java:

http://gerrit.cloudera.org:8080/#/c/13352/14/fe/src/main/java/org/apache/impala/hooks/QueryExecHook.java@59
PS14, Line 59:   void postQueryExecute(PostQueryHookContext context);
Would it be useful to the Hook implementor to have some sort of Log object 
passed in so that the hook implementation can log?


http://gerrit.cloudera.org:8080/#/c/13352/12/fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java
File fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java:

http://gerrit.cloudera.org:8080/#/c/13352/12/fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java@131
PS12, Line 131:       } catch (InstantiationException
> Yes, any exception thrown from any hook during startup will kill the entire
Thanks, I did miss the javadoc earlier.


http://gerrit.cloudera.org:8080/#/c/13352/12/fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java@178
PS12, Line 178:    * asynchronously, returning immediately with a List of 
{@link Future}s
> Still WIP: I intend to log the exception and simply swallow it.  (See desig
I missed this design doc, is there a link?


http://gerrit.cloudera.org:8080/#/c/13352/15/fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java
File fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java:

http://gerrit.cloudera.org:8080/#/c/13352/15/fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java@203
PS15, Line 203:       return hookExecutor_.submit(() -> {
As I understand it, submit() adds the work to some sort of queue. So if the 
hook execution is blocked, we will just continue to add work to the queue. So 
if the hook is blocked, we will eventually fill up memory?


http://gerrit.cloudera.org:8080/#/c/13352/14/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13352/14/fe/src/main/java/org/apache/impala/service/Frontend.java@1532
PS14, Line 1532:     // TODO: can we make use of the futures to implement better
I think this is an important piece of design that we want to understand.


http://gerrit.cloudera.org:8080/#/c/13352/6/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/13352/6/fe/src/main/java/org/apache/impala/service/JniFrontend.java@29
PS6, Line 29: import org.apache.hadoop.fs.adl.AdlFileSystem;
            : import org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem;
            : import org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem;
            : import org.apache.hadoop.fs.s3a.S3AFileSystem;
> typical IDE settings reordering the imports.  is there an impala-standard s
We use clang-format to format java as well as C++. (I know this is a bit 
obscure). You can get plugins for IDEs


http://gerrit.cloudera.org:8080/#/c/13352/14/tests/hooks/test_hooks.py
File tests/hooks/test_hooks.py:

http://gerrit.cloudera.org:8080/#/c/13352/14/tests/hooks/test_hooks.py@156
PS14, Line 156:   def test_query_exec_hooks_execute(self, unique_role, 
unique_name):
> Cannot think of a way to deterministically test that a hook has executed ou
If the hook write to a file, then the test could wait (with a timeout) for the 
file to have the correct number of lines



--
To view, visit http://gerrit.cloudera.org:8080/13352
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I23a896537a98bfef07fb27c70e9a87c105cd77a1
Gerrit-Change-Number: 13352
Gerrit-PatchSet: 14
Gerrit-Owner: radford nguyen <radford.ngu...@gmail.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (498)
Gerrit-Reviewer: Austin Nobis <ano...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fwij...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: radford nguyen <radford.ngu...@gmail.com>
Gerrit-Comment-Date: Mon, 20 May 2019 17:05:52 +0000
Gerrit-HasComments: Yes

Reply via email to