Is org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders supposed to be supported ?

Etienne Chauchot Tue, 10 Dec 2019 00:37:27 -0800

Hi all,

I have an interrogation around testFlattenMultipleCoders test:


This test uses 2 collections

1. long and null data encoded using NullableCoder(BigEndianLongCoder)

2. long data encoded using VarlongCoder

It then flattens the 2 collections and set the coder of the resultingcollection to NullableCoder(VarlongCoder)

Most runners translate flatten as a simple union of the 2 PCollectionswithout any re-encoding. As a result all the runners exclude this testfrom the test set because of coders issues. For example flink raises anexception if the type of elements in PCollection1 is different of thetype of PCollection2 in flatten translation. Another example is directrunner and spark (RDD based) runner that do not exclude this test simplybecause they don't need to serialize elements so they don't even callthe coders.

That means that having an output PCollection of the flatten withheterogeneous coders is not really tested so it is not really supported.

Should we drop this test case (that is executed by no runner) or shouldwe force each runner to re-encode ?


Best

Etienne

Is org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders supposed to be supported ?

Reply via email to