[ https://issues.apache.org/jira/browse/BEAM-5866?focusedWorklogId=159727&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-159727 ]
ASF GitHub Bot logged work on BEAM-5866: ---------------------------------------- Author: ASF GitHub Bot Created on: 28/Oct/18 14:24 Start Date: 28/Oct/18 14:24 Worklog Time Spent: 10m Work Description: kanterov commented on issue #6845: [BEAM-5866] Fix `Row#equals` URL: https://github.com/apache/beam/pull/6845#issuecomment-433710063 @kennknowles as it was last implemented, `Row#equals` didn't do any allocations, or used `structuralValue`. It became a bit confusing because ghere are two different, but related issues: 1. `structuralValue` wasn't working properly for `Row`, `Map` and `List` 2.`Row#equals` didn't handle lists or maps with BYTES I've put the fix for the first issue into a separate pull request: https://github.com/apache/beam/pull/6862. In this pull request, I removed everything not relevant for `Row#equals` As we already discussed, there are two approaches to fix equality: - rolling own `deepEquals` - wrapping `byte[]` into object implementing equality, but not in `structuralValue`, but while we build rows Not to talk about performance without having any numbers, I did two implementations and benchmarked both with JMH. Approach with own `deepEquals` has better performance by almost 50%. There are two reasons: - we can abuse the fact that our lists have random access and don't need to allocate iterators, it makes it even more performant than anything using `AbstractList#equals` - getting field type from schema is faster than using reflection There is still a problem with `Map<byte[], ?>`, it could be fixed either by banning such schemas, or rolling own `ByteArrayMap<K> extends Map<byte[], K>` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 159727) Time Spent: 3h 50m (was: 3h 40m) > RowCoder doesn't implement structuralValue > ------------------------------------------ > > Key: BEAM-5866 > URL: https://issues.apache.org/jira/browse/BEAM-5866 > Project: Beam > Issue Type: Bug > Components: sdk-java-core > Reporter: Gleb Kanterov > Assignee: Gleb Kanterov > Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > These two properties fail for RowCoder with `BYTES` field, or `Map<BYTES, ?>` > field. > {code} > public static <T> void testConsistentWithEquals(Coder<T> coder, T example) { > assumeTrue(coder.consistentWithEquals()); > byte[] bytes = encodeBytes(coder, example); > // even if the coder is non-deterministic, if the encoded bytes match, > // coder is consistent with equals, decoded values must be equal > T out0 = decodeBytes(coder, bytes); > T out1 = decodeBytes(coder, bytes); > assertEquals("If the encoded bytes match, decoded values must be equal", > out0, out1); > assertEquals( > "If two values are equal, their hash codes must be equal", > out0.hashCode(), > out1.hashCode()); > } > public static <T> void testStructuralValueConsistentWithEquals(Coder<T> > coder, T example) { > byte[] bytes = encodeBytes(coder, example); > // even if coder is non-deterministic, if the encoded bytes match, > // structural values must be equal > Object out0 = coder.structuralValue(decodeBytes(coder, bytes)); > Object out1 = coder.structuralValue(decodeBytes(coder, bytes)); > assertEquals("If the encoded bytes match, structural values must be > equal", out0, out1); > assertEquals( > "If two values are equal, their hash codes must be equal", > out0.hashCode(), > out1.hashCode()); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)