Hello everyone,

I'm using scala and spark with the version 3.4.1 in Windows 10. While streaming 
using Spark, I give the `cleanSource` option as "archive" and the 
`sourceArchiveDir` option as "archived" as in the code below.

```
spark.readStream
  .option("cleanSource", "archive")
  .option("sourceArchiveDir", "archived")
  .option("enforceSchema", false)
  .option("header", includeHeader)
  .option("inferSchema", inferSchema)
  .options(otherOptions)
  .schema(csvSchema.orNull)
  .csv(FileUtils.getPath(sourceSettings.dataFolderPath, 
mappingSource.path).toString)
```

The code ```FileUtils.getPath(sourceSettings.dataFolderPath, 
mappingSource.path)``` returns a relative path like: 
test-data\streaming-folder\patients

When I start stream, spark does not move source csv to archive folder. After 
working on it a bit, I started debugging the spark source codes. I found the 
```override protected def cleanTask(entry: FileEntry): Unit``` method in the 
`FileStreamSource.scala` file in the `org.apache.spark.sql.execution.streaming` 
package.
On line 569, the ```!fileSystem.rename(curPath, newPath)``` code supposed to 
move source file to archive folder. However, when I debugged, I noticed that 
the curPath and newPath values were as follows:

**curPath**: 
`file:/C:/dev/be/data-integration-suite/test-data/streaming-folder/patients/patients-success.csv`

**newPath**: 
`file:/C:/dev/be/data-integration-suite/archived/C:/dev/be/data-integration-suite/test-data/streaming-folder/patients/patients-success.csv`

It seems that absolute path of csv file were appended when creating `newPath` 
because there are two `C:/dev/be/data-integration-suite` in the newPath. This 
is the reason spark archiving does not work. Instead, newPath should be: 
`file:/C:/dev/be/data-integration-suite/archived/test-data/streaming-folder/patients/patients-success.csv`.
 I guess this is more related to spark library and maybe it's a spark related 
bug? Is there any workaround or spark config to overcome this problem?

Thanks
Best regards,
Yunus Emre

Reply via email to