[jira] [Created] (SEDONA-611) Cannot write rasters to S3 on EMR

Kristin Cowalcijk (Jira) Thu, 20 Jun 2024 07:04:25 -0700

Kristin Cowalcijk created SEDONA-611:
----------------------------------------


             Summary: Cannot write rasters to S3 on EMR
                 Key: SEDONA-611
                 URL: https://issues.apache.org/jira/browse/SEDONA-611
             Project: Apache Sedona
          Issue Type: Bug
    Affects Versions: 1.6.0
            Reporter: Kristin Cowalcijk
             Fix For: 1.6.1


This is a bug reported by a user on Discord. Writing the data as raster back to 
s3 on EMR raises the following error.

Error:
{code}
Caused by: java.lang.IllegalArgumentException: Pathname 
s3/...../9db15e93-3831-4066-ba1b-1f3bc364dc98.tiff is not a valid DFS filename.
    at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName
{code}

Code snippet:

{code:python}
(ndvi_geotiff.write.format("raster").option("rasterField", 
"geotiff").option("fileExtension", ".tiff").mode("overwrite")
 .save("s3://varun-poc-emr-bootstrap/raster/"))
{code}

I tried to reproduce this problem on emr-7.1.0, the write failed with the 
following exception thrown on executor:

{code}
org.apache.hadoop.fs.staging.StagingDirectoryNotFoundException: Staging 
directory not found under path s3://bucket-name/tmp/write_geotiff with stage 
name "0_attempt_202406201300423084972650467585554_0009_m_000001_13"
        at 
com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.newDirectoryNotFoundException(InMemoryStagingMetadataStore.java:225)
        at 
com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.getDirectoryOrFail(InMemoryStagingMetadataStore.java:184)
        at 
com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.createFile(InMemoryStagingMetadataStore.java:114)
        at 
com.amazon.ws.emr.hadoop.fs.s3.upload.plan.StagingUploadPlanner.plan(StagingUploadPlanner.java:61)
        at 
com.amazon.ws.emr.hadoop.fs.s3.upload.plan.UploadPlannerChain.plan(UploadPlannerChain.java:37)
        at 
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:351)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1240)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1217)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1085)
        at 
com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:202)
        at 
org.apache.spark.sql.sedona_sql.io.raster.RasterFileWriter.write(RasterFileFormat.scala:112)
        at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175)
        at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
        at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:404)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1409)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:411)
        at 
org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:143)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (SEDONA-611) Cannot write rasters to S3 on EMR

Reply via email to