Unsubscribe

2023-08-23 Thread Nizam Shaik
Unsubscribe


Unsubscribe

2023-08-23 Thread Aayush Ostwal
Unsubscribe


Unsubscribe

2023-08-23 Thread Dipayan Dev
Unsubscribe


[no subject]

2023-08-23 Thread ayan guha
Unsubscribe--
Best Regards,
Ayan Guha


Re: $SPARK_HOME/sbin/start-worker.sh spark://{main_host}:{cluster_port} failing

2023-08-23 Thread Mich Talebzadeh
Hi Jeremy,

This error concerns me

"23/08/23 20:01:03 ERROR LevelDBProvider: error opening leveldb file
file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb.
Creating new file, will not be able to recover state for existing
applications org.fusesource.leveldbjni.internal.NativeDB$DBException: IO
error: /opt/spark/spark-3.3.1-bin-hadoop3/sbin/file:/mnt/data_
ebs/infrastructure/spark/tmp/registeredExecutors.ldb/LOCK: No such file or
directory"

This is a standalone mode you are running in your EC2? Has this been run
before OK? I have not used standalone mode for a long time :(

The path mentioned in the error
/opt/spark/spark-3.3.1-bin-hadoop3/sbin/file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb/LOCK
seems dodgy so to speak , especially with the presence of /sbin/file: which
could be due a possible misconfiguration or incorrect file path
concatenation.

First aid:

   - Check Path Configuration: Ensure that the path to the LevelDB file is
   correctly configured in your configuration settings. Logically, it should
   point to a valid directory path, not including unexpected prefixes like
   /sbin/file:.
   - Can user running spark have sufficient permissions to access and
   modify files in the specified directory?
   - Check if the directory
   /mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb exists and
   if it contains the required LevelDB files. If the directory or files are
   missing, this could be a reason for the error


HTH for now


Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 23 Aug 2023 at 21:39, Jeremy Brent  wrote:

> Hi Spark Community,
>
> We have a cluster running with Spark 3.3.1. All nodes are AWS EC2’s with
> an Ubuntu OS version 22.04.
>
> One of the workers disconnected from the main node. When we run
> $SPARK_HOME/sbin/start-worker.sh spark://{main_host}:{cluster_port} it
> appears to run successfully; there is no stderr when we run the
> aforementioned command. Stdout returns the following:
> starting org.apache.spark.deploy.worker.Worker, logging to
> /opt/spark/spark-3.3.1-bin-hadoop3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ri-worker-1.out
>
> However, when we look at the log file defined in stdout, we see the
> following:
> Spark Command: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -cp
> /opt/spark/spark-3.3.1-bin-hadoop3/conf/:/opt/spark/spark-3.3.1-bin-hadoop3/jars/*
> -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://
> 10.113.62.58:7077
> 
> Using Spark's default log4j profile:
> org/apache/spark/log4j2-defaults.properties
> 23/08/23 20:01:02 INFO Worker: Started daemon with process name:
> 940835@ri-worker-1
> 23/08/23 20:01:02 INFO SignalUtils: Registering signal handler for TERM
> 23/08/23 20:01:02 INFO SignalUtils: Registering signal handler for HUP
> 23/08/23 20:01:02 INFO SignalUtils: Registering signal handler for INT
> 23/08/23 20:01:03 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 23/08/23 20:01:03 INFO SecurityManager: Changing view acls to: root
> 23/08/23 20:01:03 INFO SecurityManager: Changing modify acls to: root
> 23/08/23 20:01:03 INFO SecurityManager: Changing view acls groups to:
> 23/08/23 20:01:03 INFO SecurityManager: Changing modify acls groups to:
> 23/08/23 20:01:03 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users  with view permissions: Set(root); groups
> with view permissions: Set(); users  with modify permissions: Set(root);
> groups with modify permissions: Set()
> 23/08/23 20:01:03 INFO Utils: Successfully started service 'sparkWorker'
> on port 43757.
> 23/08/23 20:01:03 INFO Worker: Worker decommissioning not enabled.
> 23/08/23 20:01:03 ERROR LevelDBProvider: error opening leveldb file
> file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb.
> Creating new file, will not be able to recover state for existing
> applications
> org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error:
> /opt/spark/spark-3.3.1-bin-hadoop3/sbin/file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb/LOCK:
> No such file or directory
> at
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at
> org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at
> 

$SPARK_HOME/sbin/start-worker.sh spark://{main_host}:{cluster_port} failing

2023-08-23 Thread Jeremy Brent
Hi Spark Community,

We have a cluster running with Spark 3.3.1. All nodes are AWS EC2’s with an
Ubuntu OS version 22.04.

One of the workers disconnected from the main node. When we run
$SPARK_HOME/sbin/start-worker.sh spark://{main_host}:{cluster_port} it
appears to run successfully; there is no stderr when we run the
aforementioned command. Stdout returns the following:
starting org.apache.spark.deploy.worker.Worker, logging to
/opt/spark/spark-3.3.1-bin-hadoop3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ri-worker-1.out

However, when we look at the log file defined in stdout, we see the
following:
Spark Command: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -cp
/opt/spark/spark-3.3.1-bin-hadoop3/conf/:/opt/spark/spark-3.3.1-bin-hadoop3/jars/*
-Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://
10.113.62.58:7077

Using Spark's default log4j profile:
org/apache/spark/log4j2-defaults.properties
23/08/23 20:01:02 INFO Worker: Started daemon with process name:
940835@ri-worker-1
23/08/23 20:01:02 INFO SignalUtils: Registering signal handler for TERM
23/08/23 20:01:02 INFO SignalUtils: Registering signal handler for HUP
23/08/23 20:01:02 INFO SignalUtils: Registering signal handler for INT
23/08/23 20:01:03 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
23/08/23 20:01:03 INFO SecurityManager: Changing view acls to: root
23/08/23 20:01:03 INFO SecurityManager: Changing modify acls to: root
23/08/23 20:01:03 INFO SecurityManager: Changing view acls groups to:
23/08/23 20:01:03 INFO SecurityManager: Changing modify acls groups to:
23/08/23 20:01:03 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users  with view permissions: Set(root); groups
with view permissions: Set(); users  with modify permissions: Set(root);
groups with modify permissions: Set()
23/08/23 20:01:03 INFO Utils: Successfully started service 'sparkWorker' on
port 43757.
23/08/23 20:01:03 INFO Worker: Worker decommissioning not enabled.
23/08/23 20:01:03 ERROR LevelDBProvider: error opening leveldb file
file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb.
Creating new file, will not be able to recover state for existing
applications
org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error:
/opt/spark/spark-3.3.1-bin-hadoop3/sbin/file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb/LOCK:
No such file or directory
at
org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at
org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at
org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at
org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:48)
at
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:126)
at
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:99)
at
org.apache.spark.network.shuffle.ExternalBlockHandler.(ExternalBlockHandler.java:81)
at
org.apache.spark.deploy.ExternalShuffleService.newShuffleBlockHandler(ExternalShuffleService.scala:82)
at
org.apache.spark.deploy.ExternalShuffleService.(ExternalShuffleService.scala:56)
at org.apache.spark.deploy.worker.Worker.(Worker.scala:183)
at
org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:966)
at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:934)
at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
23/08/23 20:01:03 WARN LevelDBProvider: error deleting
file:/mnt/data_ebs/infrastructure/spark/tmp/registeredExecutors.ldb
23/08/23 20:01:03 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[main,5,main]
java.io.IOException: Unable to create state store
at
org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:77)
at
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:126)
at
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:99)
at
org.apache.spark.network.shuffle.ExternalBlockHandler.(ExternalBlockHandler.java:81)
at
org.apache.spark.deploy.ExternalShuffleService.newShuffleBlockHandler(ExternalShuffleService.scala:82)
at
org.apache.spark.deploy.ExternalShuffleService.(ExternalShuffleService.scala:56)
at org.apache.spark.deploy.worker.Worker.(Worker.scala:183)
at
org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:966)
at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:934)
at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO
error: