[ https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258567#comment-15258567 ]
Chris Nauroth commented on HDFS-10327: -------------------------------------- It looks like in that example, myfile.csv is a directory, and its contents are 3 files: _SUCCESS, part-00000 and part-00001. Attempting to open myfile.csv directly as a file definitely won't work. If Spark has a feature that lets you "open" it directly, then perhaps this is implemented at the application layer by Spark? Maybe it does something equivalent to {{hdfs dfs -cat myfile.csv/part*}}? That last example demonstrates the separation of concerns I'm talking about: the Hadoop shell command performs glob expansion to identify all files matching a pattern, and then it opens and displays each file separately, using HDFS APIs that operate on individual file paths. > Open files in WEBHDFS which are stored in folders by Spark/Mapreduce > -------------------------------------------------------------------- > > Key: HDFS-10327 > URL: https://issues.apache.org/jira/browse/HDFS-10327 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs > Reporter: Thomas Hille > Labels: features > > When Spark saves a file in HDFS it creates a directory which includes many > parts of the file. When you read it with spark programmatically, you can read > this directory as it is a normal file. > If you try to read this directory-style file in webhdfs, it returns > {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path > is not a file: [...] -- This message was sent by Atlassian JIRA (v6.3.4#6332)