[ 
https://issues.apache.org/jira/browse/BEAM-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653312#comment-16653312
 ] 

Gleb Kanterov commented on BEAM-5646:
-------------------------------------

[~kedin] do you have any thoughts, or perhaps you can mention somebody else?

> Equality is broken for Rows with BYTES field
> --------------------------------------------
>
>                 Key: BEAM-5646
>                 URL: https://issues.apache.org/jira/browse/BEAM-5646
>             Project: Beam
>          Issue Type: Bug
>          Components: dsl-sql
>    Affects Versions: 2.7.0
>            Reporter: Gleb Kanterov
>            Assignee: Xu Mingmin
>            Priority: Major
>
> The problem is with `org.apache.beam.sdk.values.Row#equals` and `hashCode`. 
> Java arrays do reference equality instead of comparing contents. Row stores 
> fields of type BYTES as byte[].
> These failing tests illustrate the problem:
> {code:java}
> @Test
> public void testByteArrayEquality() {
>   byte[] a0 = new byte[16];
>   byte[] b0 = new byte[16];
>   Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES));
>   Row a = Row.withSchema(schema).addValue(a0).build();
>   Row b = Row.withSchema(schema).addValue(b0).build();
>   Assert.assertEquals(a, b);
> }
> @Test
> public void testByteBufferEquality() {
>   byte[] a0 = new byte[16];
>   byte[] b0 = new byte[16];
>   Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES));
>   Row a = Row.withSchema(schema).addValue(ByteBuffer.wrap(a0)).build();
>   Row b = Row.withSchema(schema).addValue(ByteBuffer.wrap(b0)).build();
>   Assert.assertEquals(a, b);
> }
> {code}
>  
> Option 1. Fix by storing `byte[]` as `ByteBuffer`, or something more simple 
> that doesn't have offsets. `Row#getValue` will return this type, and for 
> consistency, it would be preferable to change `Row#getBytes` in an 
> incompatible way to be consistent with `Row#getValue` because that's how it 
> behaves for the rest of the methods.
>  
> Option 2. Do the same as Spark does, add `if (x instanceof byte[])` to 
> `equals`. The problem in Spark is that `hashCode` implementation isn't 
> consistent with `equals`, see SPARK-25122.
>  
> Option 3. Consider it as intended behavior, and fix 
> `RowCoder#consistentWithEquals` implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to