[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741548#comment-15741548
 ] 

Michael Dürig commented on OAK-5253:


My bad, just corrected my comment. 

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-12 Thread Julian Sedding (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741389#comment-15741389
 ] 

Julian Sedding commented on OAK-5253:
-

bq. [...] we should think about how to surface that information out from the 
DataStore/BlobStore implementations that use content hash to generate ids [...]

+1 IMHO this is the best way forward

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16, 1.4.11
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-12 Thread Amit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741381#comment-15741381
 ] 

Amit Jain commented on OAK-5253:


[~mduerig], [~jsedding] iirc then your recommendation is that we don't do this 
optimization? In which case I can resolve this issue.
But then we should think about how to surface that information out from the 
DataStore/BlobStore implementations that use content hash to generate ids which 
currently all implementations do.

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16, 1.4.11
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-12 Thread Julian Sedding (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741374#comment-15741374
 ] 

Julian Sedding commented on OAK-5253:
-

[~mduerig] I understood it to be the other way around: it's safe to return true 
if the blob ids are equal, but it is unsafe to conclude that they are different 
if their blob ids are different. In other words: blobs with identical bytes may 
have different blob ids.

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16, 1.4.11
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741363#comment-15741363
 ] 

Michael Dürig commented on OAK-5253:


I share [~jsedding]'s concern. I think it is fine to return {{false}} if the 
blob ids differ. Otherwise we probably still have to delegate to 
{{AbstractBlob.equals()}}. Fully relying on the blob id for identity is not 
correct and {{BlobStoreBlob.equals()}} seems even wrong to that respect. 

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16, 1.4.11
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-12 Thread Julian Sedding (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741338#comment-15741338
 ] 

Julian Sedding commented on OAK-5253:
-

[~amitjain] In {{BlobStoreBlob#equals}} there is the following comment before 
comparing blob ids (and that's the best I could find regarding the guarantees 
of blob ids).

{quote}
// theoretically, the data could be the same  
// even if the id is different
return b.blobId.equals(blobId);
{quote}

So strictly speaking, just like with {{#getContentIdentity()}}, equal blob ids 
guarantee that two blobs are equal, but different blob ids do not guarantee 
that two blobs are different.

I don't want to be a pain, and since {{BlobStoreBlob}} already uses the blob 
ids to implement equals, I won't object here. However, I would prefer if we 
could extend or clarify the contract of {{BlobStore}} or {{Blob}} in order to 
accommodate the use-case of determining that two blobs are different without 
comparing their bytes.

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16, 1.4.11
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-11 Thread Amit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741207#comment-15741207
 ] 

Amit Jain commented on OAK-5253:


We could change the {{SegmentBlob#equals()}} implementation to check for 
equality of getBlobIds() if non null otherwise delegate it to the AbstractBlob 
as being done currently.
This is similar to what is already done for {{BlobStoreBlob#equals()}}.

[~chetanm], [~jsedding], [~mduerig] wdyt?

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16, 1.4.11
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-09 Thread Julian Sedding (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735027#comment-15735027
 ] 

Julian Sedding commented on OAK-5253:
-

Thanks [~amitjain]. I am wondering if we could compare {{Blob#getReference()}}, 
but the contract is not very clear regarding its equality semantics. Basically 
I believe that we need to use something that is delegated to the {{BlobStore}} 
implementation.

A change in {{SegmentBlob}} would suffer from the same problem as a change in 
{{AbstractBlob}}. It is backed by an arbitrary {{BlobStore}} implementation, 
thus we cannot rely on the semantics of {{Blob#getContentIdentity()}}, as it 
may change depending on the backing {{BlobStore}}.

I have posted on the mailing list to see if someone can clarify whether 
{{Blob#getReference()}} is suited.

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16, 1.4.11
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-09 Thread Amit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734957#comment-15734957
 ] 

Amit Jain commented on OAK-5253:


Reverted commits:
* trunk - http://svn.apache.org/viewvc?rev=1773355&view=rev
* 1.4 - http://svn.apache.org/viewvc?rev=1773353&view=rev

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16, 1.4.11
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-09 Thread Amit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734916#comment-15734916
 ] 

Amit Jain commented on OAK-5253:


Ok will back out the changes and try to make change in the SegmentBlob itself.

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16, 1.4.11
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-08 Thread Amit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734550#comment-15734550
 ] 

Amit Jain commented on OAK-5253:


Thanks Matt! Was already wip. But for 1.4 needed some backports as well 
OAK-4789 & OAK-5009.

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16, 1.4.11
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-5253) Optimize AbstractBlob#equal to not do content equals when possible

2016-12-08 Thread Matt Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733366#comment-15733366
 ] 

Matt Ryan commented on OAK-5253:


I just checked the patch with 1.4 and it applies cleanly.  My testing indicates 
it should work fine with 1.4.

> Optimize AbstractBlob#equal to not do content equals when possible
> --
>
> Key: OAK-5253
> URL: https://issues.apache.org/jira/browse/OAK-5253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Amit Jain
>Assignee: Amit Jain
> Fix For: 1.6, 1.5.16
>
> Attachments: OAK-5253.1.patch
>
>
> AbstractBlob#equals tries to match content when length is equal and content 
> identities is not null and different. Matching content triggers an expensive 
> download of binaries for S3DataStore.
> Since, right now the content identity is the content hash the check can be 
> short -circuited when the content identities is not null and not equal to 
> return false.
> This can be revisited if we change the identity to something different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)