[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741548#comment-15741548 ] Michael Dürig commented on OAK-5253: My bad, just corrected my comment. > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741389#comment-15741389 ] Julian Sedding commented on OAK-5253: - bq. [...] we should think about how to surface that information out from the DataStore/BlobStore implementations that use content hash to generate ids [...] +1 IMHO this is the best way forward > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16, 1.4.11 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741381#comment-15741381 ] Amit Jain commented on OAK-5253: [~mduerig], [~jsedding] iirc then your recommendation is that we don't do this optimization? In which case I can resolve this issue. But then we should think about how to surface that information out from the DataStore/BlobStore implementations that use content hash to generate ids which currently all implementations do. > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16, 1.4.11 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741374#comment-15741374 ] Julian Sedding commented on OAK-5253: - [~mduerig] I understood it to be the other way around: it's safe to return true if the blob ids are equal, but it is unsafe to conclude that they are different if their blob ids are different. In other words: blobs with identical bytes may have different blob ids. > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16, 1.4.11 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741363#comment-15741363 ] Michael Dürig commented on OAK-5253: I share [~jsedding]'s concern. I think it is fine to return {{false}} if the blob ids differ. Otherwise we probably still have to delegate to {{AbstractBlob.equals()}}. Fully relying on the blob id for identity is not correct and {{BlobStoreBlob.equals()}} seems even wrong to that respect. > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16, 1.4.11 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741338#comment-15741338 ] Julian Sedding commented on OAK-5253: - [~amitjain] In {{BlobStoreBlob#equals}} there is the following comment before comparing blob ids (and that's the best I could find regarding the guarantees of blob ids). {quote} // theoretically, the data could be the same // even if the id is different return b.blobId.equals(blobId); {quote} So strictly speaking, just like with {{#getContentIdentity()}}, equal blob ids guarantee that two blobs are equal, but different blob ids do not guarantee that two blobs are different. I don't want to be a pain, and since {{BlobStoreBlob}} already uses the blob ids to implement equals, I won't object here. However, I would prefer if we could extend or clarify the contract of {{BlobStore}} or {{Blob}} in order to accommodate the use-case of determining that two blobs are different without comparing their bytes. > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16, 1.4.11 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741207#comment-15741207 ] Amit Jain commented on OAK-5253: We could change the {{SegmentBlob#equals()}} implementation to check for equality of getBlobIds() if non null otherwise delegate it to the AbstractBlob as being done currently. This is similar to what is already done for {{BlobStoreBlob#equals()}}. [~chetanm], [~jsedding], [~mduerig] wdyt? > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16, 1.4.11 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735027#comment-15735027 ] Julian Sedding commented on OAK-5253: - Thanks [~amitjain]. I am wondering if we could compare {{Blob#getReference()}}, but the contract is not very clear regarding its equality semantics. Basically I believe that we need to use something that is delegated to the {{BlobStore}} implementation. A change in {{SegmentBlob}} would suffer from the same problem as a change in {{AbstractBlob}}. It is backed by an arbitrary {{BlobStore}} implementation, thus we cannot rely on the semantics of {{Blob#getContentIdentity()}}, as it may change depending on the backing {{BlobStore}}. I have posted on the mailing list to see if someone can clarify whether {{Blob#getReference()}} is suited. > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16, 1.4.11 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734957#comment-15734957 ] Amit Jain commented on OAK-5253: Reverted commits: * trunk - http://svn.apache.org/viewvc?rev=1773355&view=rev * 1.4 - http://svn.apache.org/viewvc?rev=1773353&view=rev > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16, 1.4.11 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734916#comment-15734916 ] Amit Jain commented on OAK-5253: Ok will back out the changes and try to make change in the SegmentBlob itself. > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16, 1.4.11 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734550#comment-15734550 ] Amit Jain commented on OAK-5253: Thanks Matt! Was already wip. But for 1.4 needed some backports as well OAK-4789 & OAK-5009. > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16, 1.4.11 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible
[ https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733366#comment-15733366 ] Matt Ryan commented on OAK-5253: I just checked the patch with 1.4 and it applies cleanly. My testing indicates it should work fine with 1.4. > Optimize AbstractBlob#equal to not do content equals when possible > -- > > Key: OAK-5253 > URL: https://issues.apache.org/jira/browse/OAK-5253 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: blob >Reporter: Amit Jain >Assignee: Amit Jain > Fix For: 1.6, 1.5.16 > > Attachments: OAK-5253.1.patch > > > AbstractBlob#equals tries to match content when length is equal and content > identities is not null and different. Matching content triggers an expensive > download of binaries for S3DataStore. > Since, right now the content identity is the content hash the check can be > short -circuited when the content identities is not null and not equal to > return false. > This can be revisited if we change the identity to something different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)