[ https://issues.apache.org/jira/browse/BEAM-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653312#comment-16653312 ]
Gleb Kanterov commented on BEAM-5646: ------------------------------------- [~kedin] do you have any thoughts, or perhaps you can mention somebody else? > Equality is broken for Rows with BYTES field > -------------------------------------------- > > Key: BEAM-5646 > URL: https://issues.apache.org/jira/browse/BEAM-5646 > Project: Beam > Issue Type: Bug > Components: dsl-sql > Affects Versions: 2.7.0 > Reporter: Gleb Kanterov > Assignee: Xu Mingmin > Priority: Major > > The problem is with `org.apache.beam.sdk.values.Row#equals` and `hashCode`. > Java arrays do reference equality instead of comparing contents. Row stores > fields of type BYTES as byte[]. > These failing tests illustrate the problem: > {code:java} > @Test > public void testByteArrayEquality() { > byte[] a0 = new byte[16]; > byte[] b0 = new byte[16]; > Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES)); > Row a = Row.withSchema(schema).addValue(a0).build(); > Row b = Row.withSchema(schema).addValue(b0).build(); > Assert.assertEquals(a, b); > } > @Test > public void testByteBufferEquality() { > byte[] a0 = new byte[16]; > byte[] b0 = new byte[16]; > Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES)); > Row a = Row.withSchema(schema).addValue(ByteBuffer.wrap(a0)).build(); > Row b = Row.withSchema(schema).addValue(ByteBuffer.wrap(b0)).build(); > Assert.assertEquals(a, b); > } > {code} > > Option 1. Fix by storing `byte[]` as `ByteBuffer`, or something more simple > that doesn't have offsets. `Row#getValue` will return this type, and for > consistency, it would be preferable to change `Row#getBytes` in an > incompatible way to be consistent with `Row#getValue` because that's how it > behaves for the rest of the methods. > > Option 2. Do the same as Spark does, add `if (x instanceof byte[])` to > `equals`. The problem in Spark is that `hashCode` implementation isn't > consistent with `equals`, see SPARK-25122. > > Option 3. Consider it as intended behavior, and fix > `RowCoder#consistentWithEquals` implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)