The standard VARINT coder is used for all sorts of integer values (e.g. the output of the CountElements transform), but the vast majority of them are likely significantly less than a full 64 bits. In Python, declaring an element type to be int will use this. On the other hand, using a VarInt format for int8 seems quite wasteful. Where the cutoff is is probably arbitrary, but the java 32-bit int type is often used as the generic (and often small-ish) integer type in Java, whereas int16 is an explicit choice where one knows that 16 bits is good enough, but 8 isn't.
It looks like Go use the VarInt encoding everywhere: https://github.com/apache/beam/blob/release-2.14.0/sdks/go/pkg/beam/coder.go#L135 . Python, as mentioned, uses VarInt encoding everywhere as well. (There's also the question of whether we want to introduce StandardCoders for all of these, or if we'd rather move to using Schemas over Coders and just define them as part of the RowCoder.) On Tue, Jul 30, 2019 at 8:30 PM Brian Hulette <[email protected]> wrote: > Forgot to include a link to the code. The mapping from primitive type to > coders can be found here: > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java#L44 > > On Tue, Jul 30, 2019 at 11:24 AM Brian Hulette <[email protected]> > wrote: > >> Currently the coders used for integer types in RowCoder (and thus >> SchemaCoder) are inconsistent. For int32 and int64, we use VarIntCoder and >> VarLongCoder which encode those types with variable width, but for byte and >> int16 we use ByteCoder and BigEndianShortCoder, which are fixed width. >> >> Is it a conscious choice to use variable width coders just for the larger >> width integers (where they could have the most benefit), or should we >> consider normalizing these coders to always be fixed width? >> >
