I changed the code to below...

JavaPairRDD<NullWritable, String> rdd = sc.newAPIHadoopFile(inputFile,
ParquetInputFormat.class, NullWritable.class, String.class, mrConf);
JavaRDD<String> words = rdd.values().flatMap(
        new FlatMapFunction<String, String>() {
            public Iterable<String> call(String x) {
                return Arrays.asList(x.split(","));
            }
        });
With  this I get below error
 java.lang.NullPointerException
at
org.apache.parquet.hadoop.ParquetInputFormat.getReadSupportInstance(ParquetInputFormat.java:280)
at
org.apache.parquet.hadoop.ParquetInputFormat.getReadSupport(ParquetInputFormat.java:257)
at
org.apache.parquet.hadoop.ParquetInputFormat.createRecordReader(ParquetInputFormat.java:245)

My input file is a simple comma separated employee record, I created a hive
table with STORED AS PARQUET and then loaded the table from another hive
table... I can treat them as simple lines and I just need to do a word
count. So, Does my Key class and Value class make sense?

Thanks a lot for your support.
Best..



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/word-count-on-parquet-file-tp27581p27583.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to