Hi. Thanks for your answers, but, to read parquet files is necessary to use parquetFile method in org.apache.spark.sql.SQLContext, is it true?
How can I combine your solution with the called to this method? Thanks!! Regards On Thu, Mar 12, 2015 at 8:34 AM, Yijie Shen <henry.yijies...@gmail.com> wrote: > org.apache.spark.deploy.SparkHadoopUtil has a method: > > /** > * Get [[FileStatus]] objects for all leaf children (files) under the > given base path. If the > * given path points to a file, return a single-element collection > containing [[FileStatus]] of > * that file. > */ > def listLeafStatuses(fs: FileSystem, basePath: Path): Seq[FileStatus] = { > def recurse(path: Path) = { > val (directories, leaves) = fs.listStatus(path).partition(_.isDir) > leaves ++ directories.flatMap(f => listLeafStatuses(fs, f.getPath)) > } > > val baseStatus = fs.getFileStatus(basePath) > if (baseStatus.isDir) recurse(basePath) else Array(baseStatus) > } > > — > Best Regards! > Yijie Shen > > On March 12, 2015 at 2:35:49 PM, Akhil Das (ak...@sigmoidanalytics.com) > wrote: > > Hi > > We have a custom build to read directories recursively, Currently we use > it with fileStream like: > > val lines = ssc.fileStream[LongWritable, Text, > TextInputFormat]("/datadumps/", > (t: Path) => true, true, *true*) > > > Making the 4th argument true to read recursively. > > > You could give it a try > https://s3.amazonaws.com/sigmoidanalytics-builds/spark-1.2.0-bin-spark-1.2.0-hadoop2.4.0.tgz > > Thanks > Best Regards > > On Wed, Mar 11, 2015 at 9:45 PM, Masf <masfwo...@gmail.com> wrote: > >> Hi all >> >> Is it possible to read recursively folders to read parquet files? >> >> >> Thanks. >> >> -- >> >> >> Saludos. >> Miguel Ángel >> > > -- Saludos. Miguel Ángel