Create a hive table x
Load your csv data in table x (LOAD DATA INPATH 'file/path' INTO TABLE x;)
create hive table y with same structure as x except add STORED AS PARQUET;
INSERT OVERWRITE TABLE y SELECT * FROM x;
This would get you parquet files under /user/hive/warehouse/y (as an
example)
I changed the code to below...
JavaPairRDD rdd = sc.newAPIHadoopFile(inputFile,
ParquetInputFormat.class, NullWritable.class, String.class, mrConf);
JavaRDD words = rdd.values().flatMap(
new FlatMapFunction() {
public Iterable call(String
Hi All,
I am a newbie to Spark/Hadoop.
I want to read a parquet file and a perform a simple word-count. Below is my
code, however I get an error:
Exception in thread "main" java.io.IOException: No input paths specified in
job
at