Hi All,
I am a newbie to Spark/Hadoop. 
I want to read a parquet file and a perform a simple word-count. Below is my
code, however I get an error:
Exception in thread "main" java.io.IOException: No input paths specified in
job
        at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:239)
        at
org.apache.parquet.hadoop.ParquetInputFormat.listStatus(ParquetInputFormat.java:349)
        at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
        at
org.apache.parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:304)
        at 
org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:120)

Below is my code. I guess I am missing some core concepts wrt hadoop
InputFormats and making it working with spark. Coul d you please explain the
cause and solution to get this working/
-----------------------------code
snippet-----------------------------------------------------------------
JavaSparkContext sc = new JavaSparkContext(conf);
org.apache.hadoop.conf.Configuration mrConf = new Configuration();
mrConf.addResource(inputFile);
JavaPairRDD<String, String> textInputFormatObjectJavaPairRDD =
sc.newAPIHadoopRDD(mrConf, ParquetInputFormat.class, String.class,
String.class);
JavaRDD<String> words = textInputFormatObjectJavaPairRDD.values().flatMap(
                new FlatMapFunction<String, String>() {
                    public Iterable<String> call(String x) {
                        return Arrays.asList(x.split(","));
                    }
                });
long x = words.count();

--thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/word-count-on-parquet-file-tp27581.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to