Re: Read parquet folders recursively

2015-03-12 Thread Akhil Das
With fileStream you are free to plugin any InputFormat, in your case, you can easily plugin ParquetInputFormat. Here' some parquet hadoop examples https://github.com/Parquet/parquet-mr/tree/master/parquet-hadoop/src/main/java/parquet/hadoop/example . Thanks Best Regards On Thu, Mar 12, 2015 at

Re: Read parquet folders recursively

2015-03-12 Thread Masf
Hi. Thanks for your answers, but, to read parquet files is necessary to use parquetFile method in org.apache.spark.sql.SQLContext, is it true? How can I combine your solution with the called to this method? Thanks!! Regards On Thu, Mar 12, 2015 at 8:34 AM, Yijie Shen henry.yijies...@gmail.com

Re: Read parquet folders recursively

2015-03-12 Thread Akhil Das
Hi We have a custom build to read directories recursively, Currently we use it with fileStream like: val lines = ssc.fileStream[LongWritable, Text, TextInputFormat](/datadumps/, (t: Path) = true, true, *true*) Making the 4th argument true to read recursively. You could give it a try

Re: Read parquet folders recursively

2015-03-12 Thread Yijie Shen
org.apache.spark.deploy.SparkHadoopUtil has a method: /**    * Get [[FileStatus]] objects for all leaf children (files) under the given base path. If the    * given path points to a file, return a single-element collection containing [[FileStatus]] of    * that file.    */   def

Read parquet folders recursively

2015-03-11 Thread Masf
Hi all Is it possible to read recursively folders to read parquet files? Thanks. -- Saludos. Miguel Ángel