There was an annotation introduced in 2.37 to make sure we get the same order of fields in schema inferred from a POJO. https://javadoc.io/doc/org.apache.beam/beam-sdks-java-core/latest/org/apache/beam/sdk/schemas/annotations/SchemaFieldNumber.html
with that annotation schemaRegistry.getSchema(dataClass) should give you schema with the same field order. On Wed, Apr 6, 2022 at 1:35 AM Alexey Romanenko <aromanenko....@gmail.com> wrote: > Thanks for answers, Reuven. Please see the additional questions inline. > > On 5 Apr 2022, at 20:07, Reuven Lax <re...@google.com> wrote: > > On Tue, Apr 5, 2022 at 9:55 AM Alexey Romanenko <aromanenko....@gmail.com> > wrote: > >> >> So, the different fields order matters. >> >> Additionally, since "Schema.equals()” is used in "Row.equals()”, then it >> means that two Rows with different-ordered schemas but the same values will >> be considered as different rows. Is it correct? >> > > Yes, but there are ways of dealing with this: > > > But what is a point of this? Why the fields order can be important, under > which circumstances? > > 1. If using Dataflow, the pipeline update feature allows you to update to > a compatible schema (i.e. one in which the fields have the same names but a > different order) > 2.You can use the Convert transform to convert rows to a compatible schema > with a different order. > > > Well, for now it’s mostly related to unit tests (e.g. > AvroSchemaTest.testPojoRecordToRow()) when we compare a manually created > row with another row that is created from a POJO with AvroRecordSchema. I’m > playing with an Avro version upgrade [1] and it fails because there are > some changes in Avro and it creates an Avro schema with a different order > of fields. So, actually I’m thinking what we can do here with that. > > [1] https://github.com/apache/beam/pull/17246 > > >> In the same time, while generating a schema with different schema >> providers, the order of fields can be non-deterministic for some cases. >> >> For example, “GetterBasedSchemaProvider.toRowFunction(TypeDescriptor)” >> says [3] that: >> *- “schemaFor is non deterministic - it might return fields in an >> arbitrary order. The reason why is that Java reflection does not guarantee >> the order in which it returns fields and methods, and these schemas are >> often based on reflective analysis of classes. “* >> >> So, iiuc, it means that potentially we can have the "same" schema but >> with different fields order for the same, for example, POJO class but >> generated on different JVMs. >> > > Correct, and see above. > > >> >> And actually the questions: >> - Two Rows with the same field values but with two schemas of different >> fields order should be considered as two different rows or not? >> - This behaviour explained above - is this that was expected by initial >> schema design? >> - If fields order is so important then why? >> >> PS: My question is actually related to >> "AvroRecordSchema().toRowFunction()” but I guess other SchemaProvider’s >> also can be affected. >> >> >> — >> Alexey >> >> [1] >> https://beam.apache.org/documentation/programming-guide/#schema-definition >> [2] >> https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L303 >> [3] >> https://github.com/apache/beam/blob/0262ee53c6018d929a8a40fdf66735cc7e934951/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/GetterBasedSchemaProvider.java#L91 >> > >