Jorge Machado created SPARK-30647:
-------------------------------------

             Summary: When creating a custom datasource File NotFoundExpection 
happens
                 Key: SPARK-30647
                 URL: https://issues.apache.org/jira/browse/SPARK-30647
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.3.2
            Reporter: Jorge Machado


Hello, I'm creating a datasource based on FileFormat and DataSourceRegister. 

when I pass a path or a file that has a white space it seems to fail wit the 
error: 
{code:java}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in 
stage 2.0 failed 1 times, most recent failure: Lost task 1.0 in stage 2.0 (TID 
213, localhost, executor driver): java.io.FileNotFoundException: File 
file:somePath/0019_leftImg8%20bit.png does not exist It is possible the 
underlying files have been updated. You can explicitly invalidate the cache in 
Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the 
Dataset/DataFrame involved. at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source) at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
 at 
org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
 at 
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221) 
at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
 at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
 at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
 at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
{code}
I'm happy to fix this if someone tells me where I need to look.  

I think it is on org.apache.spark.rdd.InputFileBlockHolder : 
{code:java}
inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset, 
length))
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to