Reamer commented on code in PR #4691: URL: https://github.com/apache/zeppelin/pull/4691#discussion_r1406119893
########## rlang/pom.xml: ########## @@ -116,18 +116,10 @@ <dependency> <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-client</artifactId> + <artifactId>hadoop-client-runtime</artifactId> <version>${hadoop.version}</version> <scope>compile</scope> </dependency> - - <dependency> - <groupId>org.apache.hadoop</groupId> - <artifactId>hadoop-common</artifactId> - <version>${hadoop.version}</version> - <scope>compile</scope> - </dependency> - <dependency> Review Comment: > There is a switch in YARN to enable/disable Hadoop class population for containers. I don't know how this is used in Zeppelin. > QQ, I understand we should not include Hadoop classes in plugins, because they will be loaded into the same JVM with Zeppelin server, so that they can share the Hadoop classes. What about the interpreteres? I assume the interpreters are always run in dedicated JVMs, so Hadoop classes seem always necessary (except for those runtimes who already provided Hadoop classes, e.g. Spark, Flink)? Correct the Zeppelin server & the zengine use the same JVM as the Zeppelin plugins. In my opinion, the interpreters usually run in separate JVM instances. We should set the scope of Hadoop to Provided in the interpreter, because I think the Hadoop code in the interpreter is only in use for YARN. See https://github.com/apache/zeppelin/blob/56da029ffe413c55ba34f46e4e4b91b8d20d9ce2/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/YarnUtils.java#L20 Maybe there will be a way to remove the dependency at some point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org