Re: Read parquet folders recursively

Masf Thu, 12 Mar 2015 05:24:49 -0700

Hi.

Thanks for your answers, but, to read parquet files is necessary to use
parquetFile method in org.apache.spark.sql.SQLContext,  is it true?


How can I combine your solution with the called to this method?

Thanks!!
Regards

On Thu, Mar 12, 2015 at 8:34 AM, Yijie Shen <henry.yijies...@gmail.com>
wrote:

> org.apache.spark.deploy.SparkHadoopUtil has a method:
>
> /**
>    * Get [[FileStatus]] objects for all leaf children (files) under the
> given base path. If the
>    * given path points to a file, return a single-element collection
> containing [[FileStatus]] of
>    * that file.
>    */
>   def listLeafStatuses(fs: FileSystem, basePath: Path): Seq[FileStatus] = {
>     def recurse(path: Path) = {
>       val (directories, leaves) = fs.listStatus(path).partition(_.isDir)
>       leaves ++ directories.flatMap(f => listLeafStatuses(fs, f.getPath))
>     }
>
>     val baseStatus = fs.getFileStatus(basePath)
>     if (baseStatus.isDir) recurse(basePath) else Array(baseStatus)
>   }
>
> —
> Best Regards!
> Yijie Shen
>
> On March 12, 2015 at 2:35:49 PM, Akhil Das (ak...@sigmoidanalytics.com)
> wrote:
>
>  Hi
>
> We have a custom build to read directories recursively, Currently we use
> it with fileStream like:
>
>  val lines = ssc.fileStream[LongWritable, Text,
> TextInputFormat]("/datadumps/",
>       (t: Path) => true, true, *true*)
>
>
> Making the 4th argument true to read recursively.
>
>
> You could give it a try
> https://s3.amazonaws.com/sigmoidanalytics-builds/spark-1.2.0-bin-spark-1.2.0-hadoop2.4.0.tgz
>
>  Thanks
> Best Regards
>
> On Wed, Mar 11, 2015 at 9:45 PM, Masf <masfwo...@gmail.com> wrote:
>
>> Hi all
>>
>> Is it possible to read recursively folders to read parquet files?
>>
>>
>> Thanks.
>>
>> --
>>
>>
>> Saludos.
>> Miguel Ángel
>>
>
>


-- 


Saludos.
Miguel Ángel

Re: Read parquet folders recursively

Reply via email to