[ 
https://issues.apache.org/jira/browse/SPARK-30647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024242#comment-17024242
 ] 

Jorge Machado commented on SPARK-30647:
---------------------------------------

I found a way to overcome this. I just replace %20 with " " like this:
{code:java}
(file: PartitionedFile) => {
    val origin = file.filePath.
            replace("%20", " ")
}{code}

> When creating a custom datasource File NotFoundExpection happens
> ----------------------------------------------------------------
>
>                 Key: SPARK-30647
>                 URL: https://issues.apache.org/jira/browse/SPARK-30647
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.2
>            Reporter: Jorge Machado
>            Priority: Major
>
> Hello, I'm creating a datasource based on FileFormat and DataSourceRegister. 
> when I pass a path or a file that has a white space it seems to fail wit the 
> error: 
> {code:java}
>  org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 1.0 in stage 2.0 
> (TID 213, localhost, executor driver): java.io.FileNotFoundException: File 
> file:somePath/0019_leftImg8%20bit.png does not exist It is possible the 
> underlying files have been updated. You can explicitly invalidate the cache 
> in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating 
> the Dataset/DataFrame involved. at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source) at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
>  at 
> org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
>  at 
> org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
>  at 
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
> {code}
> I'm happy to fix this if someone tells me where I need to look.  
> I think it is on org.apache.spark.rdd.InputFileBlockHolder : 
> {code:java}
> inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset, 
> length))
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to