Re: word count on parquet file

ayan guha Mon, 22 Aug 2016 14:32:33 -0700

You are missing input. Mrconf is not the way to add input files. In spark,
try Dataframe read functions or sc.textfile function.


Best
Ayan
On 23 Aug 2016 07:12, "shamu" <prashant...@hotmail.com> wrote:

> Hi All,
> I am a newbie to Spark/Hadoop.
> I want to read a parquet file and a perform a simple word-count. Below is
> my
> code, however I get an error:
> Exception in thread "main" java.io.IOException: No input paths specified in
> job
>         at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
> listStatus(FileInputFormat.java:239)
>         at
> org.apache.parquet.hadoop.ParquetInputFormat.listStatus(
> ParquetInputFormat.java:349)
>         at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
> getSplits(FileInputFormat.java:387)
>         at
> org.apache.parquet.hadoop.ParquetInputFormat.getSplits(
> ParquetInputFormat.java:304)
>         at org.apache.spark.rdd.NewHadoopRDD.getPartitions(
> NewHadoopRDD.scala:120)
>
> Below is my code. I guess I am missing some core concepts wrt hadoop
> InputFormats and making it working with spark. Coul d you please explain
> the
> cause and solution to get this working/
> -----------------------------code
> snippet-----------------------------------------------------------------
> JavaSparkContext sc = new JavaSparkContext(conf);
> org.apache.hadoop.conf.Configuration mrConf = new Configuration();
> mrConf.addResource(inputFile);
> JavaPairRDD<String, String> textInputFormatObjectJavaPairRDD =
> sc.newAPIHadoopRDD(mrConf, ParquetInputFormat.class, String.class,
> String.class);
> JavaRDD<String> words = textInputFormatObjectJavaPairRDD.values().flatMap(
>                 new FlatMapFunction<String, String>() {
>                     public Iterable<String> call(String x) {
>                         return Arrays.asList(x.split(","));
>                     }
>                 });
> long x = words.count();
>
> --thanks!
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/word-count-on-parquet-file-tp27581.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: word count on parquet file

Reply via email to