Kostas Tzoumas created TEZ-1704: ----------------------------------- Summary: Derive from Edge configs Key: TEZ-1704 URL: https://issues.apache.org/jira/browse/TEZ-1704 Project: Apache Tez Issue Type: Wish Affects Versions: 0.5.2 Reporter: Kostas Tzoumas
I am working on making Apache Flink run on top of Tez. Flink uses its own serialization and deserialization machinery and does not rely on Hadoop Writables. To pass data between Tez processors, we encapsulate objects that are (de)serialized by Flink inside a Hadoop writable, and use that writable as the value in the Tez key-value pairs that are being read and written by operators. This requires a Flink type serializer object to be present at the Tez reader and the input classes. To do that, we had to create a custom input reader and a custom input that derive from KeyValueReader and AbstractLogical input respectively: https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/runtime/input/FlinkUnorderedKVInput.java https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/runtime/input/FlinkUnorderedKVReader.java This also meant creating custom edge configs to return the correct input type (in this case FlinkUnorderedKVInput): https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/dag/FlinkUnorderedKVEdgeConfig.java https://github.com/ktzoumas/incubator-flink/blob/tez-support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/dag/FlinkUnorderedPartitionedKVEdgeConfig.java To create these, we needed to derive from UnorderedKVEdgeConfig and UnorderedPartitionedKVEdgeConfig respectively, and change some fields from private to protected (a patch showing the changes is attached). We are not using the sorting facilities of Tez, we rather use the Flink sort operators inside Tez processors. This is the reason that the Ordered classes are not modified. I was wondering if there might be a better way to do this, and if not, whether the change described in the patch would be acceptable for the next Tez release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)