Felix Neutatz created FLINK-1271: ------------------------------------ Summary: Extend HadoopOutputFormat and HadoopInputFormat to handle Void.class Key: FLINK-1271 URL: https://issues.apache.org/jira/browse/FLINK-1271 Project: Flink Issue Type: Wish Components: Hadoop Compatibility Reporter: Felix Neutatz Priority: Minor
Parquet, one of the most famous and efficient column store formats in Hadoop uses Void.class as Key! At the moment there are only keys allowed which extend Writable. For example, we would need to be able to do something like: HadoopInputFormat hadoopInputFormat = new HadoopInputFormat(new ParquetThriftInputFormat(), Void.class, AminoAcid.class, job); ParquetThriftInputFormat.addInputPath(job, new Path("newpath")); ParquetThriftInputFormat.setReadSupportClass(job, AminoAcid.class); // Create a Flink job with it DataSet<Tuple2<Void, AminoAcid>> data = env.createInput(hadoopInputFormat); Where AminoAcid is a generated Thrift class in this case. However, I figured out how to output Parquet files with Parquet by creating a class which extends HadoopOutputFormat. Now we will have to discuss, what's the best approach to make the Parquet integration happen -- This message was sent by Atlassian JIRA (v6.3.4#6332)