Jia Yu created SEDONA-495:
-----------------------------

             Summary: Raster data source uses shared FileSystem connections 
which lead to race condition
                 Key: SEDONA-495
                 URL: https://issues.apache.org/jira/browse/SEDONA-495
             Project: Apache Sedona
          Issue Type: Bug
            Reporter: Jia Yu


The raster data source's OutputWriter uses `new 
Path(savePath).getFileSystem(context.getConfiguration)` to get a Hadoop 
FileSystem instance and a OutputWriter instance is initiated per task. This 
function will return a shared connection among all tasks on an executor.

 

https://github.com/apache/sedona/blob/master/spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/io/raster/RasterFileFormat.scala#L85

 

It is common that a multi-core executor gets multiple concurrent tasks (one 
task per core). In the current implementation, if one task is completed, the 
connection is closed and all other tasks are having IO exception.

 

The best practice is to use `FileSystem.newInstance` for each task.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to