Using the spark-csv package or outputting to text files, you end up with files 
named:

test.csv/part-00

rather than a more user-friendly "test.csv", even if there's only 1 part file.

We can merge the files using the Hadoop merge command with something like this 
code from http://deploymentzone.com/2015/01/30/spark-and-merged-csv-files/


def merge(sc: SparkContext, srcPath: String, dstPath: String): Unit = {

    val srcFileSystem = FileSystem.get(new URI(srcPath), sc.hadoopConfiguration)

    val dstFileSystem = FileSystem.get(new URI(dstPath), sc.hadoopConfiguration)

    dstFileSystem.delete(new Path(dstPath), true)

    FileUtil.copyMerge(srcFileSystem, new Path(srcPath), dstFileSystem, new 
Path(dstPath), true, sc.hadoopConfiguration, null)

  }

but does anyone know a way without dropping down to Hadoop.fs code?

Thanks,
Ewan

Reply via email to