Philipp Dallig created ZEPPELIN-5897: ----------------------------------------
Summary: Spark-Interpreter context change Key: ZEPPELIN-5897 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5897 Project: Zeppelin Issue Type: Bug Components: spark Reporter: Philipp Dallig I have encountered some strange behaviour in the Spark interpreter. This problem occurs when several cron jobs are started in parallel. The launch command looks quite good. {code:java} [INFO] Interpreter launch command: /opt/conda/lib/python3.9/site-packages/pyspark/bin/spark-submit --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer --driver-class-path /usr/share/java/*:/tmp/local-repo/spark_8g_8g/*:/opt/zeppelin/interpreter/spark/*:::/opt/zeppelin/interpreter/zeppelin-interpreter-shaded-0.11.0-SNAPSHOT.jar:/opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar --driver-java-options -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin/conf/log4j.properties -Dlog4j.configurationFile=file:///opt/zeppelin/conf/log4j2.properties -Dzeppelin.log.file=/opt/zeppelin/logs/zeppelin-interpreter-spark_8g_8g-isolated-2G8V2J18D-2023-04-11_00-00-00--spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren.log --conf spark.driver.maxResultSize=8g --conf spark.kubernetes.executor.request.cores=0. --conf spark.network.timeout=1800 --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog --verbose --conf spark.jars.ivySettings=/opt/spark/ivysettings.xml --proxy-user ejavaheri --conf spark.master=k8s://https://kubernetes.default.svc --conf spark.driver.memory=8g --conf spark.driver.cores=2 --conf spark.app.name=spark_8g_8g --conf spark.driver.host=spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren.spark.svc --conf spark.kubernetes.memoryOverheadFactor=0.4 --conf spark.webui.yarn.useProxy=false --conf spark.blockManager.port=22322 --conf spark.driver.port=22321 --conf spark.driver.bindAddress=0.0.0.0 --conf spark.kubernetes.namespace=spark --conf spark.kubernetes.driver.request.cores=200m --conf spark.kubernetes.driver.pod.name=spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren --conf spark.executor.instances=1 --conf spark.executor.memory=8g --conf spark.executor.cores=4 --conf spark.submit.deployMode=client --conf spark.kubernetes.container.image=harbor.mycompany.com/dap/zeppelin-executor:3.3 /opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar zeppelin-server.spark.svc 12320 spark_8g_8g-isolated-2G8V2J18D-2023-04-11_00-00-00 12321:12321{code} As you can see the config value `spark.driver.host` is `spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren.spark.svc`, which is correct During start-up, the host seems to change. New name: {code:java} spark2g4g-isolated-2d8reueys-2023-04-1100-00-00-fbvrgw.spark.svc {code} The new name is the host name of the other parallel running cron job. How is it possible that the spark driver host changes? Does Zeppelin even have the possibility to do this? {code:java} INFO [2023-04-11 00:00:04,288] ({RegisterThread} RemoteInterpreterServer.java[run]:620) - Start registration INFO [2023-04-11 00:00:04,288] ({RemoteInterpreterServer-Thread} RemoteInterpreterServer.java[run]:200) - Launching ThriftServer at 10.129.4.191:12321 INFO [2023-04-11 00:00:05,409] ({RegisterThread} RemoteInterpreterServer.java[run]:634) - Registering interpreter process INFO [2023-04-11 00:00:05,433] ({RegisterThread} RemoteInterpreterServer.java[run]:636) - Registered interpreter process INFO [2023-04-11 00:00:05,433] ({RegisterThread} RemoteInterpreterServer.java[run]:657) - Registration finished WARN [2023-04-11 00:00:05,517] ({pool-3-thread-1} ZeppelinConfiguration.java[<init>]:87) - Failed to load XML configuration, proceeding with a default,for a stacktrace activate the debug log INFO [2023-04-11 00:00:05,522] ({pool-3-thread-1} ZeppelinConfiguration.java[create]:137) - Server Host: 127.0.0.1 INFO [2023-04-11 00:00:05,523] ({pool-3-thread-1} ZeppelinConfiguration.java[create]:144) - Zeppelin Version: 0.11.0-SNAPSHOT INFO [2023-04-11 00:00:05,522] ({pool-3-thread-1} ZeppelinConfiguration.java[create]:141) - Server Port: 8080 INFO [2023-04-11 00:00:05,523] ({pool-3-thread-1} ZeppelinConfiguration.java[create]:143) - Context Path: / INFO [2023-04-11 00:00:05,531] ({pool-3-thread-1} RemoteInterpreterServer.java[createLifecycleManager]:293) - Creating interpreter lifecycle manager: org.apache.zeppelin.interpreter.lifecycle.TimeoutLifecycleManager INFO [2023-04-11 00:00:05,535] ({pool-3-thread-1} RemoteInterpreterServer.java[init]:236) - Creating RemoteInterpreterEventClient with connection pool size: 100 INFO [2023-04-11 00:00:05,535] ({pool-3-thread-1} TimeoutLifecycleManager.java[onInterpreterProcessStarted]:73) - Interpreter process: spark_8g_8g-isolated-2G8V2J18D-2023-04-11_00-00-00 is started INFO [2023-04-11 00:00:05,535] ({pool-3-thread-1} TimeoutLifecycleManager.java[<init>]:67) - TimeoutLifecycleManager is started with checkInterval: 60000, timeoutThreshold: ¸3600000 INFO [2023-04-11 00:00:05,627] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.SparkInterpreter, isForceShutdown: true INFO [2023-04-11 00:00:05,635] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.SparkSqlInterpreter, isForceShutdown: true INFO [2023-04-11 00:00:05,645] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter, isForceShutdown: true INFO [2023-04-11 00:00:05,655] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.IPySparkInterpreter, isForceShutdown: true INFO [2023-04-11 00:00:05,663] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.SparkRInterpreter, isForceShutdown: true INFO [2023-04-11 00:00:05,670] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.SparkIRInterpreter, isForceShutdown: true INFO [2023-04-11 00:00:05,679] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.SparkShinyInterpreter, isForceShutdown: true INFO [2023-04-11 00:00:05,753] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.KotlinSparkInterpreter, isForceShutdown: true INFO [2023-04-11 00:00:05,806] ({pool-3-thread-1} SchedulerFactory.java[createOrGetFIFOScheduler]:76) - Create FIFOScheduler: interpreter_688737023 INFO [2023-04-11 00:00:05,806] ({pool-3-thread-1} SchedulerFactory.java[<init>]:56) - Scheduler Thread Pool Size: 100 INFO [2023-04-11 00:00:05,810] ({FIFOScheduler-interpreter_688737023-Worker-1} AbstractScheduler.java[runJob]:127) - Job 20210622-101638_112853005 started by scheduler interpreter_688737023 INFO [2023-04-11 00:00:05,818] ({pool-3-thread-2} SchedulerFactory.java[createOrGetFIFOScheduler]:76) - Create FIFOScheduler: interpreter_839216362 INFO [2023-04-11 00:00:05,818] ({pool-3-thread-2} SchedulerFactory.java[createOrGetParallelScheduler]:88) - Create ParallelScheduler: org.apache.zeppelin.spark.SparkSqlInterpreter1135593921 with maxConcurrency: 10 INFO [2023-04-11 00:00:05,857] ({FIFOScheduler-interpreter_688737023-Worker-1} SparkInterpreter.java[extractScalaVersion]:279) - Using Scala: version 2.12.15 INFO [2023-04-11 00:00:05,881] ({FIFOScheduler-interpreter_688737023-Worker-1} SparkScala212Interpreter.scala[createSparkILoop]:182) - Scala shell repl output dir: /tmp/spark16004603505225443508 INFO [2023-04-11 00:00:06,113] ({FIFOScheduler-interpreter_688737023-Worker-1} SparkScala212Interpreter.scala[createSparkILoop]:191) - UserJars: file:/opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar:/opt/zeppelin/interpreter/spark/scala-2.12/spark-scala-2.12-0.11.0-SNAPSHOT.jar INFO [2023-04-11 00:00:11,260] ({FIFOScheduler-interpreter_688737023-Worker-1} HiveConf.java[findConfigFile]:187) - Found configuration file file:/opt/conda/lib/python3.9/site-packages/pyspark/conf/hive-site.xml INFO [2023-04-11 00:00:11,438] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Running Spark version 3.3.0 INFO [2023-04-11 00:00:11,472] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - No custom resources configured for spark.driver. INFO [2023-04-11 00:00:11,472] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - ============================================================== INFO [2023-04-11 00:00:11,471] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - ============================================================== INFO [2023-04-11 00:00:11,473] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Submitted application: spark_8g_8g INFO [2023-04-11 00:00:11,500] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: , memory -> name: memory, amount: 8192, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) INFO [2023-04-11 00:00:11,512] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Limiting resource is cpus at 4 tasks per executor INFO [2023-04-11 00:00:11,515] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Added ResourceProfile id: 0 INFO [2023-04-11 00:00:11,580] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Changing view acls to: zeppelin,ejavaheri INFO [2023-04-11 00:00:11,580] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Changing modify acls to: zeppelin,ejavaheri INFO [2023-04-11 00:00:11,581] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zeppelin, ejavaheri); groups with view permissions: Set(); users with modify permissions: Set(zeppelin, ejavaheri); groups with modify permissions: Set() INFO [2023-04-11 00:00:11,581] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Changing modify acls groups to: INFO [2023-04-11 00:00:11,581] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Changing view acls groups to: INFO [2023-04-11 00:00:11,852] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Successfully started service 'sparkDriver' on port 22321. INFO [2023-04-11 00:00:11,880] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Registering MapOutputTracker INFO [2023-04-11 00:00:11,912] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Registering BlockManagerMaster INFO [2023-04-11 00:00:11,946] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information INFO [2023-04-11 00:00:11,947] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - BlockManagerMasterEndpoint up INFO [2023-04-11 00:00:11,950] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Registering BlockManagerMasterHeartbeat INFO [2023-04-11 00:00:11,975] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Created local directory at /tmp/blockmgr-1903d257-be01-4cb7-954f-9a5c13ab0598 INFO [2023-04-11 00:00:11,993] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - MemoryStore started with capacity 4.6 GiB INFO [2023-04-11 00:00:12,010] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Registering OutputCommitCoordinator INFO [2023-04-11 00:00:12,079] ({FIFOScheduler-interpreter_688737023-Worker-1} Log.java[initialized]:170) - Logging initialized @9839ms to org.sparkproject.jetty.util.log.Slf4jLog INFO [2023-04-11 00:00:12,193] ({FIFOScheduler-interpreter_688737023-Worker-1} Server.java[doStart]:375) - jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 11.0.17+8-post-Ubuntu-1ubuntu220.04 INFO [2023-04-11 00:00:12,223] ({FIFOScheduler-interpreter_688737023-Worker-1} Server.java[doStart]:415) - Started @9983ms INFO [2023-04-11 00:00:12,273] ({FIFOScheduler-interpreter_688737023-Worker-1} AbstractConnector.java[doStart]:333) - Started ServerConnector@325be8be{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} INFO [2023-04-11 00:00:12,274] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Successfully started service 'SparkUI' on port 4040. INFO [2023-04-11 00:00:12,310] ({FIFOScheduler-interpreter_688737023-Worker-1} ContextHandler.java[doStart]:921) - Started o.s.j.s.ServletContextHandler@47745fce{/,null,AVAILABLE,@Spark} INFO [2023-04-11 00:00:12,342] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Added JAR file:/opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar at spark://spark2g4g-isolated-2d8reueys-2023-04-1100-00-00-fbvrgw.spark.svc:22321/jars/spark-interpreter-0.11.0-SNAPSHOT.jar with timestamp 1681164011433 INFO [2023-04-11 00:00:12,413] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Auto-configuring K8S client using current context from users K8S config file {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)