Reamer commented on code in PR #4936:
URL: https://github.com/apache/zeppelin/pull/4936#discussion_r2121065097
##########
spark-submit/src/main/java/org/apache/zeppelin/spark/submit/SparkSubmitInterpreter.java:
##########
@@ -37,32 +36,72 @@
/**
- * Support %spark-submit which run spark-submit command. Internally,
- * it would run shell command via ShellInterpreter.
- *
+ * Interpreter that supports the `%spark-submit` command in Apache Zeppelin.
+ * <p>
+ * This interpreter allows users to submit Spark jobs using the standard
`spark-submit` CLI
+ * interface.
+ * Internally, it delegates execution to the ShellInterpreter to run
`spark-submit` as a shell
+ * command.
+ * <p>
+ * Key features:
+ * - Automatically builds and executes the `spark-submit` command using the
configured SPARK_HOME
+ * path.
+ * - Extracts the Spark UI URL from logs and publishes it to the Zeppelin
frontend.
+ * - Tracks the YARN Application ID from the logs, allowing the job to be
cancelled via `yarn
+ * application -kill`.
+ * - Handles both YARN and local Spark modes.
+ * <p>
+ * Required configuration:
+ * - SPARK_HOME must be set in the interpreter setting or environment
variables. It should point
+ * to the root
+ * directory of a valid Spark installation.
+ * <p>
+ * Example usage in a Zeppelin notebook:
+ * %spark-submit --class org.apache.spark.examples.SparkPi /path/to/jar
spark-args
*/
public class SparkSubmitInterpreter extends ShellInterpreter {
private static final Logger LOGGER =
LoggerFactory.getLogger(SparkSubmitInterpreter.class);
- private String sparkHome;
-
- // paragraphId --> yarnAppId
+ private final String sparkHome;
private ConcurrentMap<String, String> yarnAppIdMap = new
ConcurrentHashMap<>();
public SparkSubmitInterpreter(Properties property) {
super(property);
- // Set time to be max integer so that the shell process won't timeout.
- setProperty("shell.command.timeout.millisecs", Integer.MAX_VALUE + "");
- this.sparkHome = properties.getProperty("SPARK_HOME");
+ setProperty("shell.command.timeout.millisecs",
String.valueOf(Integer.MAX_VALUE));
+ this.sparkHome = property.getProperty("SPARK_HOME");
LOGGER.info("SPARK_HOME: {}", sparkHome);
}
+ /**
+ * Executes a spark-submit command based on the user's input in a Zeppelin
notebook paragraph.
+ * <p>
+ * This method constructs the full spark-submit CLI command using the
configured SPARK_HOME and
+ * the
+ * provided arguments. It performs validation (e.g., SPARK_HOME presence),
logs the execution,
+ * and registers a listener to extract Spark UI information from the output
logs.
+ * <p>
+ * If SPARK_HOME is not set, an error result is returned.
+ * After execution, any associated YARN application ID is removed from the
internal tracking map.
+ *
+ * @param cmd The spark-submit arguments entered by the user (e.g.,
"--class ...
+ * /path/to/jar").
+ * @param context The interpreter context for the current paragraph
execution.
+ * @return An {@link InterpreterResult} representing the outcome of the
spark-submit execution.
+ */
@Override
public InterpreterResult internalInterpret(String cmd, InterpreterContext
context) {
if (StringUtils.isBlank(cmd)) {
return new InterpreterResult(InterpreterResult.Code.SUCCESS);
}
+
+ if (StringUtils.isBlank(sparkHome)) {
+ String errorMsg = "SPARK_HOME is not set. Please configure SPARK_HOME in
the interpreter " +
+ "setting or environment.";
+ LOGGER.error("Failed to run spark-submit: {}", errorMsg);
+ return new InterpreterResult(InterpreterResult.Code.ERROR, errorMsg);
+ }
+
String sparkSubmitCommand = sparkHome + "/bin/spark-submit " + cmd.trim();
Review Comment:
I think it is important that we also check the existence of the submit
command.
Something like that. This is also how we get rid of the path separators.
```
File sparkSubmit = Paths.get(sparkHome, "bin", "spark-submit").toFile();
if (sparkSubmit.exists()) {
```
##########
spark-submit/src/main/java/org/apache/zeppelin/spark/submit/SparkSubmitInterpreter.java:
##########
@@ -37,32 +36,72 @@
/**
- * Support %spark-submit which run spark-submit command. Internally,
- * it would run shell command via ShellInterpreter.
- *
+ * Interpreter that supports the `%spark-submit` command in Apache Zeppelin.
+ * <p>
+ * This interpreter allows users to submit Spark jobs using the standard
`spark-submit` CLI
+ * interface.
+ * Internally, it delegates execution to the ShellInterpreter to run
`spark-submit` as a shell
+ * command.
+ * <p>
+ * Key features:
+ * - Automatically builds and executes the `spark-submit` command using the
configured SPARK_HOME
+ * path.
+ * - Extracts the Spark UI URL from logs and publishes it to the Zeppelin
frontend.
+ * - Tracks the YARN Application ID from the logs, allowing the job to be
cancelled via `yarn
+ * application -kill`.
+ * - Handles both YARN and local Spark modes.
+ * <p>
+ * Required configuration:
+ * - SPARK_HOME must be set in the interpreter setting or environment
variables. It should point
+ * to the root
+ * directory of a valid Spark installation.
+ * <p>
+ * Example usage in a Zeppelin notebook:
+ * %spark-submit --class org.apache.spark.examples.SparkPi /path/to/jar
spark-args
*/
public class SparkSubmitInterpreter extends ShellInterpreter {
private static final Logger LOGGER =
LoggerFactory.getLogger(SparkSubmitInterpreter.class);
- private String sparkHome;
-
- // paragraphId --> yarnAppId
Review Comment:
Please leave the comment in the code.
##########
spark-submit/src/main/java/org/apache/zeppelin/spark/submit/SparkSubmitInterpreter.java:
##########
@@ -37,32 +36,72 @@
/**
- * Support %spark-submit which run spark-submit command. Internally,
- * it would run shell command via ShellInterpreter.
- *
+ * Interpreter that supports the `%spark-submit` command in Apache Zeppelin.
+ * <p>
+ * This interpreter allows users to submit Spark jobs using the standard
`spark-submit` CLI
+ * interface.
+ * Internally, it delegates execution to the ShellInterpreter to run
`spark-submit` as a shell
+ * command.
+ * <p>
+ * Key features:
+ * - Automatically builds and executes the `spark-submit` command using the
configured SPARK_HOME
+ * path.
+ * - Extracts the Spark UI URL from logs and publishes it to the Zeppelin
frontend.
+ * - Tracks the YARN Application ID from the logs, allowing the job to be
cancelled via `yarn
+ * application -kill`.
+ * - Handles both YARN and local Spark modes.
+ * <p>
+ * Required configuration:
+ * - SPARK_HOME must be set in the interpreter setting or environment
variables. It should point
+ * to the root
+ * directory of a valid Spark installation.
+ * <p>
+ * Example usage in a Zeppelin notebook:
+ * %spark-submit --class org.apache.spark.examples.SparkPi /path/to/jar
spark-args
*/
public class SparkSubmitInterpreter extends ShellInterpreter {
private static final Logger LOGGER =
LoggerFactory.getLogger(SparkSubmitInterpreter.class);
- private String sparkHome;
-
- // paragraphId --> yarnAppId
+ private final String sparkHome;
private ConcurrentMap<String, String> yarnAppIdMap = new
ConcurrentHashMap<>();
public SparkSubmitInterpreter(Properties property) {
super(property);
- // Set time to be max integer so that the shell process won't timeout.
- setProperty("shell.command.timeout.millisecs", Integer.MAX_VALUE + "");
- this.sparkHome = properties.getProperty("SPARK_HOME");
+ setProperty("shell.command.timeout.millisecs",
String.valueOf(Integer.MAX_VALUE));
+ this.sparkHome = property.getProperty("SPARK_HOME");
Review Comment:
The properties are passed all the way through to the `Interpreter` class.
https://github.com/apache/zeppelin/blob/1468714b5b187bafe83ff0bf13cac3be50642c4e/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/Interpreter.java#L156-L158
We should adopt the variable name `properties` from the interpreter class
here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]