With fileStream you are free to plugin any InputFormat, in your case, you
can easily plugin ParquetInputFormat. Here' some parquet hadoop examples
https://github.com/Parquet/parquet-mr/tree/master/parquet-hadoop/src/main/java/parquet/hadoop/example
.
Thanks
Best Regards
On Thu, Mar 12, 2015 at
Hi.
Thanks for your answers, but, to read parquet files is necessary to use
parquetFile method in org.apache.spark.sql.SQLContext, is it true?
How can I combine your solution with the called to this method?
Thanks!!
Regards
On Thu, Mar 12, 2015 at 8:34 AM, Yijie Shen henry.yijies...@gmail.com
Hi
We have a custom build to read directories recursively, Currently we use it
with fileStream like:
val lines = ssc.fileStream[LongWritable, Text,
TextInputFormat](/datadumps/,
(t: Path) = true, true, *true*)
Making the 4th argument true to read recursively.
You could give it a try
org.apache.spark.deploy.SparkHadoopUtil has a method:
/**
* Get [[FileStatus]] objects for all leaf children (files) under the given
base path. If the
* given path points to a file, return a single-element collection containing
[[FileStatus]] of
* that file.
*/
def
Hi all
Is it possible to read recursively folders to read parquet files?
Thanks.
--
Saludos.
Miguel Ángel