Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/13352 )
Change subject: IMPALA-8473: publish lineage info via hook ...................................................................... Patch Set 14: (7 comments) Thanks Radford for working through these issues! It is worth getting this stuff right. What is the quality of service being provided by the Hook Manager. Do we guarantee that we will call hooks? If a query ends at time t, is there any guarantee of when we will call the hook (like with 10 seconds of t)? What do we do if hook execution hangs forever? If we guarantee delivery in a timely fashion then at some point failing to execute hooks in a timely fashion should be detected and something dramatic should happen like queries being failed. We probably don't want to fail by getting OutOfMemory. If we don't guarantee delivery in a timely fashion then at some point we should stop queuing work and start throwing it away. If things do go wrong we want to be able to diagnose what is happening. Logging is good but ideally we would also expose hook execution metrics in the webui. http://gerrit.cloudera.org:8080/#/c/13352/14/fe/src/main/java/org/apache/impala/hooks/QueryExecHook.java File fe/src/main/java/org/apache/impala/hooks/QueryExecHook.java: http://gerrit.cloudera.org:8080/#/c/13352/14/fe/src/main/java/org/apache/impala/hooks/QueryExecHook.java@59 PS14, Line 59: void postQueryExecute(PostQueryHookContext context); Would it be useful to the Hook implementor to have some sort of Log object passed in so that the hook implementation can log? http://gerrit.cloudera.org:8080/#/c/13352/12/fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java File fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java: http://gerrit.cloudera.org:8080/#/c/13352/12/fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java@131 PS12, Line 131: } catch (InstantiationException > Yes, any exception thrown from any hook during startup will kill the entire Thanks, I did miss the javadoc earlier. http://gerrit.cloudera.org:8080/#/c/13352/12/fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java@178 PS12, Line 178: * asynchronously, returning immediately with a List of {@link Future}s > Still WIP: I intend to log the exception and simply swallow it. (See desig I missed this design doc, is there a link? http://gerrit.cloudera.org:8080/#/c/13352/15/fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java File fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java: http://gerrit.cloudera.org:8080/#/c/13352/15/fe/src/main/java/org/apache/impala/hooks/QueryExecHookManager.java@203 PS15, Line 203: return hookExecutor_.submit(() -> { As I understand it, submit() adds the work to some sort of queue. So if the hook execution is blocked, we will just continue to add work to the queue. So if the hook is blocked, we will eventually fill up memory? http://gerrit.cloudera.org:8080/#/c/13352/14/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/13352/14/fe/src/main/java/org/apache/impala/service/Frontend.java@1532 PS14, Line 1532: // TODO: can we make use of the futures to implement better I think this is an important piece of design that we want to understand. http://gerrit.cloudera.org:8080/#/c/13352/6/fe/src/main/java/org/apache/impala/service/JniFrontend.java File fe/src/main/java/org/apache/impala/service/JniFrontend.java: http://gerrit.cloudera.org:8080/#/c/13352/6/fe/src/main/java/org/apache/impala/service/JniFrontend.java@29 PS6, Line 29: import org.apache.hadoop.fs.adl.AdlFileSystem; : import org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem; : import org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem; : import org.apache.hadoop.fs.s3a.S3AFileSystem; > typical IDE settings reordering the imports. is there an impala-standard s We use clang-format to format java as well as C++. (I know this is a bit obscure). You can get plugins for IDEs http://gerrit.cloudera.org:8080/#/c/13352/14/tests/hooks/test_hooks.py File tests/hooks/test_hooks.py: http://gerrit.cloudera.org:8080/#/c/13352/14/tests/hooks/test_hooks.py@156 PS14, Line 156: def test_query_exec_hooks_execute(self, unique_role, unique_name): > Cannot think of a way to deterministically test that a hook has executed ou If the hook write to a file, then the test could wait (with a timeout) for the file to have the correct number of lines -- To view, visit http://gerrit.cloudera.org:8080/13352 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I23a896537a98bfef07fb27c70e9a87c105cd77a1 Gerrit-Change-Number: 13352 Gerrit-PatchSet: 14 Gerrit-Owner: radford nguyen <radford.ngu...@gmail.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Anonymous Coward (498) Gerrit-Reviewer: Austin Nobis <ano...@cloudera.com> Gerrit-Reviewer: Fredy Wijaya <fwij...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: radford nguyen <radford.ngu...@gmail.com> Gerrit-Comment-Date: Mon, 20 May 2019 17:05:52 +0000 Gerrit-HasComments: Yes