Jia Yu created SEDONA-495: ----------------------------- Summary: Raster data source uses shared FileSystem connections which lead to race condition Key: SEDONA-495 URL: https://issues.apache.org/jira/browse/SEDONA-495 Project: Apache Sedona Issue Type: Bug Reporter: Jia Yu
The raster data source's OutputWriter uses `new Path(savePath).getFileSystem(context.getConfiguration)` to get a Hadoop FileSystem instance and a OutputWriter instance is initiated per task. This function will return a shared connection among all tasks on an executor. https://github.com/apache/sedona/blob/master/spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/io/raster/RasterFileFormat.scala#L85 It is common that a multi-core executor gets multiple concurrent tasks (one task per core). In the current implementation, if one task is completed, the connection is closed and all other tasks are having IO exception. The best practice is to use `FileSystem.newInstance` for each task. -- This message was sent by Atlassian Jira (v8.20.10#820010)