[ https://issues.apache.org/jira/browse/BEAM-5646?focusedWorklogId=159250&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-159250 ]
ASF GitHub Bot logged work on BEAM-5646: ---------------------------------------- Author: ASF GitHub Bot Created on: 26/Oct/18 17:05 Start Date: 26/Oct/18 17:05 Worklog Time Spent: 10m Work Description: amaliujia commented on a change in pull request #6765: [BEAM-5646] Fix quality and hashcode for bytes in Row. URL: https://github.com/apache/beam/pull/6765#discussion_r228600245 ########## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java ########## @@ -347,12 +347,12 @@ public boolean equals(Object o) { } Row other = (Row) o; return Objects.equals(getSchema(), other.getSchema()) - && Objects.equals(getValues(), other.getValues()); + && Objects.deepEquals(getValues().toArray(), other.getValues().toArray()); Review comment: The problem comes from `Objects.deepEquals` only have a deep equal implementation for primitive types and array. So Map and List will at least fail to check the correct equality. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 159250) Time Spent: 4.5h (was: 4h 20m) > Equality is broken for Rows with BYTES field > -------------------------------------------- > > Key: BEAM-5646 > URL: https://issues.apache.org/jira/browse/BEAM-5646 > Project: Beam > Issue Type: Bug > Components: dsl-sql > Affects Versions: 2.7.0 > Reporter: Gleb Kanterov > Assignee: Rui Wang > Priority: Major > Fix For: Not applicable > > Time Spent: 4.5h > Remaining Estimate: 0h > > The problem is with `org.apache.beam.sdk.values.Row#equals` and `hashCode`. > Java arrays do reference equality instead of comparing contents. Row stores > fields of type BYTES as byte[]. > These failing tests illustrate the problem: > {code:java} > @Test > public void testByteArrayEquality() { > byte[] a0 = new byte[16]; > byte[] b0 = new byte[16]; > Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES)); > Row a = Row.withSchema(schema).addValue(a0).build(); > Row b = Row.withSchema(schema).addValue(b0).build(); > Assert.assertEquals(a, b); > } > @Test > public void testByteBufferEquality() { > byte[] a0 = new byte[16]; > byte[] b0 = new byte[16]; > Schema schema = Schema.of(Schema.Field.of("bytes", Schema.FieldType.BYTES)); > Row a = Row.withSchema(schema).addValue(ByteBuffer.wrap(a0)).build(); > Row b = Row.withSchema(schema).addValue(ByteBuffer.wrap(b0)).build(); > Assert.assertEquals(a, b); > } > {code} > > Option 1. Fix by storing `byte[]` as `ByteBuffer`, or something more simple > that doesn't have offsets. `Row#getValue` will return this type, and for > consistency, it would be preferable to change `Row#getBytes` in an > incompatible way to be consistent with `Row#getValue` because that's how it > behaves for the rest of the methods. > > Option 2. Do the same as Spark does, add `if (x instanceof byte[])` to > `equals`. The problem in Spark is that `hashCode` implementation isn't > consistent with `equals`, see SPARK-25122. > > Option 3. Consider it as intended behavior, and fix > `RowCoder#consistentWithEquals` implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)