Re: [PR] [SPARK-48936][CONNECT] Makes spark-shell works with Spark connect [spark]

via GitHub Fri, 02 Aug 2024 02:46:14 -0700


pan3793 commented on code in PR #47402:
URL: https://github.com/apache/spark/pull/47402#discussion_r1701610965



##########
bin/spark-shell:
##########
@@ -44,8 +44,53 @@ Scala REPL options:
 # through spark.driver.extraClassPath is not automatically propagated.
 SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"
 
+# In order to start Spark Connect shell, we should identify if spark.remote
+# or --remote is set. Spark Connect does not support loading configurations
+# yet.
+connect_shell=false
+cur_arg="$0"
+for arg in "${@:1}"
+do
+  # --conf spark.remote=... or -c spark.remote=...
+  if [[ $cur_arg == "--conf" || $cur_arg == "-c" ]]; then
+    if [[ $arg == "spark.remote"* ]]; then
+      connect_shell=true
+    fi
+  fi
+
+  # --conf=spark.remote=... or -c=spark.remote=...
+  if [[ $arg == "--conf=spark.remote"* || $arg == "-c=spark.remote"* ]]; then
+    connect_shell=true
+  fi
+
+  # --remote= or --remote
+  if [[ $arg == "--remote"* ]]; then
+    connect_shell=true
+  fi
+  cur_arg=$arg
+done
+
 function main() {
-  if $cygwin; then
+  if $connect_shell; then
+     export SPARK_SUBMIT_OPTS
+     export SPARK_CONNECT_SHELL=1
+     if [ -d "${SPARK_HOME}/jars" ]; then
+       # Production code path
+       coordinate=$(find "${SPARK_HOME}/jars" -type f -name 'spark-connect_*')
+       coordinate=$(basename $coordinate)
+       sparkver=${${${coordinate##*_}%.jar*}#*-}
+       scalaver=${${${coordinate##*_}%.jar*}%%-*}
+       "${SPARK_HOME}"/bin/spark-submit \
+         --class org.apache.spark.sql.application.ConnectRepl \
+         --packages 
com.lihaoyi:ammonite_2.13.14:3.0.0-M2,org.apache.spark:spark-connect-client-jvm_$scalaver:$sparkver
 --name "Connect shell" "$@"

Review Comment:
   is this a temporary workaround or designed as a long-term solution?
   
   I know that we can not put `spark-connect-client-jvm` into `jars` due to 
class name issues, but this requires the user to download the jars from the 
internet (or an private maven repo) when executing `spark-shell --remote xxx` 
for the first time , this may not be a good solution for users deploying Spark 
to environments with restricted Internet access.
   
   How about including these jars into spark binary tgz in a different folder?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48936][CONNECT] Makes spark-shell works with Spark connect [spark]

Reply via email to