I did some experiments and it seems not. But I like to get confirmation (or
perhaps I missed something). If it does support, could u let me know how to
specify multiple folders? Thanks.
SenqiangÂ
Thanks Ted. Actually a follow up question. I need to read multiple HDFS files
into RDD. What I am doing now is: for each file I read them into a RDD. Then
later on I union all these RDDs into one RDD. I am not sure if it is the best
way to do it.
ThanksSenqiang
On Tuesday, March 3, 2015
Looking at FileInputFormat#listStatus():
// Whether we need to recursive look into the directory structure
boolean recursive = job.getBoolean(INPUT_DIR_RECURSIVE, false);
where:
public static final String INPUT_DIR_RECURSIVE =
mapreduce.input.fileinputformat.input.dir.recursive;
Thanks for the confirmation, Stephen.
On Tue, Mar 3, 2015 at 3:53 PM, Stephen Boesch java...@gmail.com wrote:
Thanks, I was looking at an old version of FileInputFormat..
BEFORE setting the recursive config (
mapreduce.input.fileinputformat.input.dir.recursive)
scala
Thanks, I was looking at an old version of FileInputFormat..
BEFORE setting the recursive config (
mapreduce.input.fileinputformat.input.dir.recursive)
scala sc.textFile(dev/*).count
java.io.IOException: *Not a file*:
file:/shared/sparkup/dev/audit-release/blank_maven_build
The default is
The sc.textFile() invokes the Hadoop FileInputFormat via the (subclass)
TextInputFormat. Inside the logic does exist to do the recursive directory
reading - i.e. first detecting if an entry were a directory and if so then
descending:
for (FileStatus
Looking at scaladoc:
/** Get an RDD for a Hadoop file with an arbitrary new API InputFormat. */
def newAPIHadoopFile[K, V, F : NewInputFormat[K, V]]
Your conclusion is confirmed.
On Tue, Mar 3, 2015 at 1:59 PM, S. Zhou myx...@yahoo.com.invalid wrote:
I did some experiments and it seems
This API reads a directory of files, not one file. A file here
really means a directory full of part-* files. You do not need to read
those separately.
Any syntax that works with Hadoop's FileInputFormat should work. I
thought you could specify a comma-separated list of paths? maybe I am
Thanks guys. So does this recursive tag work for newAPIHadoopFile?
On Tuesday, March 3, 2015 3:55 PM, Ted Yu yuzhih...@gmail.com wrote:
Thanks for the confirmation, Stephen.
On Tue, Mar 3, 2015 at 3:53 PM, Stephen Boesch java...@gmail.com wrote:
Thanks, I was looking at an old