angelosnm opened a new issue, #1723:
URL: https://github.com/apache/sedona/issues/1723
I have set up a standalone Spark cluster where PySpark jobs are being sent.
These jobs are having the below config where S3/MinIO is being used as HDFS
(using the S3A package) to read raster files:
```python
config = (
SedonaContext.builder()
.master(spark_endpoint) \
.appName("RasterProcessingWithSedona") \
.config("spark.driver.host", socket.gethostbyname(socket.gethostname()))
\
.config("spark.driver.port", "2222") \
.config("spark.blockManager.port", "36859") \
.config("spark.executor.memory", "16g") \
.config("spark.executor.cores", "4") \
.config("spark.driver.memory", "10g") \
.config("spark.hadoop.fs.s3a.endpoint", s3_endpoint) \
.config("spark.hadoop.fs.s3a.access.key", s3_access_key_id) \
.config("spark.hadoop.fs.s3a.secret.key", s3_secret_access_key) \
.config("spark.hadoop.fs.s3a.connection.ssl.enabled", "false") \
.config("spark.hadoop.fs.s3a.path.style.access", "true") \
.config(
'spark.jars.packages',
'org.apache.sedona:sedona-spark-shaded-3.5_2.12:1.6.1,'
'org.datasyslab:geotools-wrapper:1.6.1-28.2'
)
.getOrCreate()
)
```
Then, the raster/tif files are being accessed as per below:
```python
raster_path = "s3a://data/BFA"
rawDf = sedona.read.format("binaryFile").option("recursiveFileLookup",
"true").option("pathGlobFilter", "*.tif*").load(raster_path)
rawDf.createOrReplaceTempView("rawdf")
rawDf.show()
```
And this code. returns the error mentioned in the "Actual behavior" entry.
If this code runs under local mode it runs normally
## Expected behavior

## Actual behavior
```bash
Py4JJavaError: An error occurred while calling o66.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0
(TID 5) (192.168.18.112 executor 1): TaskResultLost (result lost from block
manager)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2856)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2792)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2791)
at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2791)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1247)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1247)
at scala.Option.foreach(Option.scala:407)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1247)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3060)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2994)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2983)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:989)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2393)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2414)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2433)
at
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:530)
at
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:483)
at
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:61)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:4334)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:3316)
at
org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4324)
at
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
at
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4322)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4322)
at org.apache.spark.sql.Dataset.head(Dataset.scala:3316)
at org.apache.spark.sql.Dataset.take(Dataset.scala:3539)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:280)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:315)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
```
## Steps to reproduce the problem
## Settings
Sedona version = 1.6.1
Apache Spark version = 3.5.2
Apache Flink version = N/A
API type = Python
Scala version = 2.12
JRE version = 1.8.0_432
Python version = 3.11.10
Environment = Standalone
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]