Felix Neutatz created FLINK-1271:
------------------------------------
Summary: Extend HadoopOutputFormat and HadoopInputFormat to handle
Void.class
Key: FLINK-1271
URL: https://issues.apache.org/jira/browse/FLINK-1271
Project: Flink
Issue Type: Wish
Components: Hadoop Compatibility
Reporter: Felix Neutatz
Priority: Minor
Parquet, one of the most famous and efficient column store formats in Hadoop
uses Void.class as Key!
At the moment there are only keys allowed which extend Writable.
For example, we would need to be able to do something like:
HadoopInputFormat hadoopInputFormat = new HadoopInputFormat(new
ParquetThriftInputFormat(), Void.class, AminoAcid.class, job);
ParquetThriftInputFormat.addInputPath(job, new Path("newpath"));
ParquetThriftInputFormat.setReadSupportClass(job, AminoAcid.class);
// Create a Flink job with it
DataSet<Tuple2<Void, AminoAcid>> data = env.createInput(hadoopInputFormat);
Where AminoAcid is a generated Thrift class in this case.
However, I figured out how to output Parquet files with Parquet by creating a
class which extends HadoopOutputFormat.
Now we will have to discuss, what's the best approach to make the Parquet
integration happen
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)