[ 
https://issues.apache.org/jira/browse/OAK-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569802#comment-16569802
 ] 

Alexander Klimetschek edited comment on OAK-7569 at 8/6/18 6:56 AM:
--------------------------------------------------------------------

h4. Problem

Unfortunately, we could not commit the final agreed change this week, because 
[~mattvryan] found a blocking issue: {{BinaryDownload.getURI()}} always returns 
null because the regular code path {{node.getProperty().getBinary()}} does not 
pass the {{BlobAccessProvider}} instance through to the {{ValueFactoryImpl}} 
because it uses a [static method of ValueFactoryImpl in 
PropertyImpl.getValue()|https://github.com/apache/jackrabbit-oak/blob/905c9bd0f716778a035a708bbdb8de634f464e66/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/session/PropertyImpl.java#L252-L253].

Looking further, there are actually many static method usages of 
{{ValueFactoryImpl}} today, that all need to ensure a {{BlobAccessProvider}} is 
available.

[~mattvryan] and me think there are 2 solutions:
 # fix the static ValueFactoryImpl issue as much as possible (but ALL cases 
seem to be impossible to fix without a major Oak refactoring)
 # go back to the original API design (2a) or a slight variation thereof (2b) 
as a feasible compromise.

More details below.
h4. Details

The unit tests so far had a shortcut where it reused the {{Binary}} used for 
writing, hence no one saw the issue earlier. This has been fixed in the latest 
changes for the integration tests: 
[https://github.com/mattvryan/jackrabbit-oak/pull/12] (also includes a lot of 
improvements around the test framework itself).

This issue, that did not exist initially, and was tested, was introduced as a 
result of the refactorings during the review - which moved the API to a 
ValueFactory extension. At the same time the unit tests changed a bit, 
accidentally removing the important test case.

It is possible to fix this particular Value access case, since {{PropertyImpl}} 
already has access to the per-session {{ValueFactoryImpl}} instance, and it 
just needs to replace all the static usages left by leveraging 
{{getValueFactory()}}. Same for some other places in oak-jcr and oak-core that 
currently employ a mix of instance and static method access. [~mattvryan] did 
an attempt at this here: 
[https://github.com/mattvryan/jackrabbit-oak/commit/e022f846c0ff113c95910d26584568004beacb66]

Ideally, to ensure the {{blobAccessProvider}} reference in the 
{{ValueFactoryImpl}} is always set to the reference from the whiteboard that 
the {{SessionContext}} has, one would have to eliminate the static methods on 
{{ValueFactoryImpl}} and replace by instance methods. The class isn't doing 
proper encapsulation at this point, which is visible by the fact that 
{{NamePathMapper}}, an implementation detail the ValueFactoryImpl relies on, is 
actually all over the Oak code base.

*However, there are two static {{ValueFactoryImpl.createValue()}} methods 
retrieving a PropertyState and PropertyValue argument respectively (+ 
NamePathMapper) that are used in different places in oak, including 
oak-security-spi.* These are completely unrelated to the binary change. They 
don't have any immediate access to the {{SessionContext}} (which provides both 
the {{ValueFactoryImpl}} and {{BlobAcccessProvider}} instance). They all get a 
{{NamePathMapper}} passed in, which is essentially the same situation that a 
component reference must be available for the {{ValueFactoryImpl}} methods to 
do their work, but if you follow the call hierarchy it literally explodes. 
Meaning a lot of files all over Oak would have to be changed to actually pass 
through the {{ValueFactoryImpl}} instance instead of just {{NamePathMapper}}.

But even then, in many places it does not even get any access to the real 
{{NamePathMapper}} which {{SessionContext}} implements, but a 
{{NamePathMapper.DEFAULT}} is used (which apparently works for some of these 
cases?). It might be that some components were never designed to get access to 
the {{SessionContext}}, at least it seems it would be a massive refactoring to 
ensure that everywhere.
h4. Unsolved Binary cases

How does that affect the direct binary access feature? One example is 
[AuthorizablePropertiesImpl|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/security/user/AuthorizablePropertiesImpl.java#L108].
 This corresponds to the code path of 
{{Authorizable.getProperty("foo").getValue().getBinary()}}, which would never 
be able to return anything from {{getURI()}}. Note that it will implement 
{{BinaryDownload}}, and the API contract is that {{getURI()}} shall return null 
if the feature is not available, but not if the client used the "wrong" way of 
retrieving a {{Binary}}.

Other cases are hard to reason about: it's unclear where exactly they could end 
up being used by a JCR client. We tried to analyze it but I am not sure one can 
make guarantees... and since the static methods aren't property-type specific, 
one cannot exclude binary properties. Furthermore we have to consider future 
usage of these static methods (unless they become deprecated or removed).

FWIW, this is the reason in our original patch we did not extended the 
{{ValueFactory}} at all but used an extension interface: 
{{BinaryAccessProvider}} on {{Session}}, as then access to the 
{{SessionContext}} and anything else is trivial. The idea of the 
{{ValueFactory}} as creator of values sounds good in theory, but the 
implementation in Oak using static methods creates values "out of thin air" 
with no access to other components. Except for NamePathMapper, which apparently 
was there since the beginning of Oak and hence was incorporated in all the code 
over time.

Also note that this only applies to downloads (BinaryDownload.getURI()). The 
upload ({{JackrabbitValueFactory.completeBinaryUpload()}}) is never affected, 
because the JCR client code will retrieve the {{JackrabbitValueFactory}} 
instance from {{Session.getValueFactory()}}, which will always be the "good" 
instance.
h4. Options

[~mattvryan] and me identified these two options at this point. Given the huge 
effort of getting rid of the static {{ValueFactoryImpl.createValue()}} methods 
completely (a patch touching all of Oak), we excluded that one as a viable 
option - feedback welcome.
 # *Fix {{ValueFactoryImpl}} as much as possible* and ignore the cases where 
the static methods are still used and "hope" they never play a role for client 
code wanting to access {{BinaryDownload}}.
 ** Pro: no API change, can work short term.
 ** Con: unknown edge cases are not covered.
 # *Change the API back to the {{BinaryAccessProvider}} design*, as an 
extension of {{Session}}.
 ** Pro: no implementation refactoring needed, changes limited to a few places
 ** Con: Oak team preferred the other API approach; the API was already 
released as jackrabbit-api 2.17.5. while there are no downstream clients yet, 
reverting it would be difficult (major package version update or removing 
release artifact from maven central, ... see below)

There are two variants of 2:
 * 2a) Go fully back to a {{BinaryAccessProvider}} with both upload and 
download in one interface. But that would mean removing the new interface 
{{JackrabbitValueFactory}} again (or it's methods). Just deprecating them and 
have them linger around would be pretty awful for jackrabbit-api, which should 
be "polished". That would result in a major version change of 
{{org.apache.jackrabbit.api}} ([2.5.0 to 
3.0.0|https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-api/src/main/java/org/apache/jackrabbit/api/package-info.java]),
 which would break all existing JCR client code (in an OSGi environment). 
Otherwise, given there can be no real usage of this API as nothing released 
implements it yet, one could think of reverting the 2.17.5 release, removing it 
from maven-central and just do a clean new start from 2.17.4 with whatever API 
we want to add.

 * 2b) Introduce {{BinaryAccessProvider}} or {{BinaryDownloadProvider}} only 
for download, i.e. having a method {{getDownloadURI(Binary, 
BinaryDownloadOptions)}}. Then we would only have to remove {{BinaryDownload}}, 
but that's not a problem because it's in the new package 
{{org.apache.jackrabbit.api.binary}} for which we could freely jump from 1.0.0 
to 2.0.0 export version (unlike {{org.apache.jackrabbit.api}} in 2a., no one is 
using that package yet).

h4. My take
 * Option 1: I would feel a bit uncomfortable with such a "fingers crossed" 
approach, where we don't know the edge cases ;)
 * Option 2a: a single interface for upload (write) and download (read) would 
arguably the more intuitive approach, if we can't have both ValueFactory for 
write and Binary extension for read. But I don't know if the API revert is 
possible.
 * Option 2b: Very straightforward, but somewhat funky API in the end. Maybe it 
doesn't matter with good documentation.

Thoughts?


was (Author: alexander.klimetschek):
h4. Problem

Unfortunately, we could not commit the final agreed change this week, because 
[~mattvryan] found a blocking issue: {{BinaryDownload.getURI()}} always returns 
null because the regular code path {{node.getProperty().getBinary()}} does not 
pass the {{BlobAccessProvider}} instance through to the {{ValueFactoryImpl}} 
because it uses a [static method of ValueFactoryImpl in 
PropertyImpl.getValue()|https://github.com/apache/jackrabbit-oak/blob/905c9bd0f716778a035a708bbdb8de634f464e66/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/session/PropertyImpl.java#L252-L253].

Looking further, there are actually many static method usages of 
{{ValueFactoryImpl}} today, that all need to ensure a {{BlobAccessProvider}} is 
available.

[~mattvryan] and me think there are 2 solutions:
 # fix the static ValueFactoryImpl issue as much as possible (but ALL cases 
seem to be impossible to fix without a major Oak refactoring)
 # go back to the original API design (2a) or a slight variation thereof (2b) 
as a feasible compromise.

More details below.
h4. Details

The unit tests so far had a shortcut where it reused the {{Binary}} used for 
writing, hence no one saw the issue earlier. This has been fixed in the latest 
changes for the integration tests: 
[https://github.com/mattvryan/jackrabbit-oak/pull/12] (also includes a lot of 
improvements around the test framework itself).

This issue, that did not exist initially, and was tested, was introduced as a 
result of the refactorings during the review - which moved the API to a 
ValueFactory extension. At the same time the unit tests changed a bit, 
accidentally removing the important test case.

It is possible to fix this particular Value access case, since {{PropertyImpl}} 
already has access to the per-session {{ValueFactoryImpl}} instance, and it 
just needs to replace all the static usages left by leveraging 
{{getValueFactory()}}. Same for some other places in oak-jcr and oak-core that 
currently employ a mix of instance and static method access. [~mattvryan] did 
an attempt at this here: 
[https://github.com/mattvryan/jackrabbit-oak/commit/e022f846c0ff113c95910d26584568004beacb66]

Ideally, to ensure the {{blobAccessProvider}} reference in the 
{{ValueFactoryImpl}} is always set to the reference from the whiteboard that 
the {{SessionContext}} has, one would have to eliminate the static methods on 
{{ValueFactoryImpl}} and replace by instance methods. The class isn't doing 
proper encapsulation at this point, which is visible by the fact that 
{{NamePathMapper}}, an implementation detail the ValueFactoryImpl relies on, is 
actually all over the Oak code base.

*However, there are two static {{ValueFactoryImpl.createValue()}} methods 
retrieving a PropertyState and PropertyValue argument respectively (+ 
NamePathMapper) that are used in different places in oak, including 
oak-security-spi.* These are completely unrelated to the binary change. They 
don't have any immediate access to the {{SessionContext}} (which provides both 
the {{ValueFactoryImpl}} and {{BlobAcccessProvider}} instance). They all get a 
{{NamePathMapper}} passed in, which is essentially the same situation that a 
component reference must be available for the {{ValueFactoryImpl}} methods to 
do their work, but if you follow the call hierarchy it literally explodes. 
Meaning a lot of files all over Oak would have to be changed to actually pass 
through the {{ValueFactoryImpl}} instance instead of just {{NamePathMapper}}.

But even then, in many places it does not even get any access to the real 
{{NamePathMapper}} which {{SessionContext}} implements, but a 
{{NamePathMapper.DEFAULT}} is used (which apparently works for some of these 
cases?). It might be that some components were never designed to get access to 
the {{SessionContext}}, at least it seems it would be a massive refactoring to 
ensure that everywhere.
h4. Unsolved Binary cases

How does that affect the direct binary access feature? One example is 
[AuthorizablePropertiesImpl|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/security/user/AuthorizablePropertiesImpl.java#L108].
 This corresponds to the code path of 
{{Authorizable.getProperty("foo").getValue().getBinary()}}, which would never 
be able to return anything from {{getURI()}}. Note that it will implement 
{{BinaryDownload}}, and the API contract is that {{getURI()}} shall return null 
if the feature is not available, but not if the client used the "wrong" way of 
retrieving a {{Binary}}.

Other cases are hard to reason about: it's unclear where exactly they could end 
up being used by a JCR client. We tried to analyze it but I am not sure one can 
make guarantees... and since the static methods aren't property-type specific, 
one cannot exclude binary properties. Furthermore we have to consider future 
usage of these static methods (unless they become deprecated or removed).

FWIW, this is the reason in our original patch we did not extended the 
{{ValueFactory}} at all but used an extension interface 
({{BinaryAccessProvider}} on {{Session}}, as then access to the 
{{SessionContext}} and anything else is trivial. The idea of the 
{{ValueFactory}} as creator of values sounds good in theory, but the 
implementation in Oak using static methods creates values "out of thin air" 
with no access to other components.

Also note that this only applies to downloads (BinaryDownload.getURI()). The 
upload ({{JackrabbitValueFactory.completeBinaryUpload()}}) is never affected, 
because the JCR client code will retrieve the {{JackrabbitValueFactory}} 
instance from {{Session.getValueFactory()}}, which will always be the "good" 
instance.
h4. Options

[~mattvryan] and me identified these two options at this point. Given the huge 
effort of getting rid of the static {{ValueFactoryImpl.createValue()}} methods 
completely (a patch touching all of Oak), we excluded that one as a viable 
option - feedback welcome.
 # *Fix {{ValueFactoryImpl}} as much as possible* and ignore the cases where 
the static methods are still used and "hope" they never play a role for client 
code wanting to access {{BinaryDownload}}.
 ** Pro: no API change, can work short term.
 ** Con: unknown edge cases are not covered.
 # *Change the API back to the {{BinaryAccessProvider}} design*, as an 
extension of {{Session}}.
 ** Pro: no implementation refactoring needed, changes limited to a few places
 ** Con: Oak team preferred the other API approach; the API was already 
released as jackrabbit-api 2.17.5. while there are no downstream clients yet, 
reverting it would be difficult (major package version update or removing 
release artifact from maven central, ... see below)

There are two variants of 2:
 * 2a) Go fully back to a {{BinaryAccessProvider}} with both upload and 
download in one interface. But that would mean removing the new interface 
{{JackrabbitValueFactory}} again (or it's methods). Just deprecating them and 
have them linger around would be pretty awful for jackrabbit-api, which should 
be "polished". That would result in a major version change of 
{{org.apache.jackrabbit.api}} ([2.5.0 to 
3.0.0|https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-api/src/main/java/org/apache/jackrabbit/api/package-info.java]),
 which would break all existing JCR client code (in an OSGi environment). 
Otherwise, given there can be no real usage of this API as nothing released 
implements it yet, one could think of reverting the 2.17.5 release, removing it 
from maven-central and just do a clean new start from 2.17.4 with whatever API 
we want to add.

 * 2b) Introduce {{BinaryAccessProvider}} or {{BinaryDownloadProvider}} only 
for download, i.e. having a method {{getDownloadURI(Binary, 
BinaryDownloadOptions)}}. Then we would only have to remove {{BinaryDownload}}, 
but that's not a problem because it's in the new package 
{{org.apache.jackrabbit.api.binary}} for which we could freely jump from 1.0.0 
to 2.0.0 export version (unlike {{org.apache.jackrabbit.api}} in 2a., no one is 
using that package yet).

h4. My take
 * Option 1: I would feel a bit uncomfortable with such a "fingers crossed" 
approach, where we don't know the edge cases ;)
 * Option 2a: a single interface for upload (write) and download (read) would 
arguably the more intuitive approach, if we can't have both ValueFactory for 
write and Binary extension for read. But I don't know if the API revert is 
possible.
 * Option 2b: Very straightforward, but somewhat funky API in the end. Maybe it 
doesn't matter with good documentation.

Thoughts?

> Direct Binary Access
> --------------------
>
>                 Key: OAK-7569
>                 URL: https://issues.apache.org/jira/browse/OAK-7569
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: api, blob-cloud, blob-cloud-azure, blob-plugins
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>         Attachments: OAK-7569-api-javadoc-improvements.patch
>
>
> Provide a direct binary access feature to Oak which allows an authenticated 
> client to create or download blobs directly to/from the blob store, assuming 
> the authenticated user has appropriate permission to do so. The primary value 
> of this feature is that the I/O of transferring large binary files to or from 
> the blob store can be offloaded entirely from Oak and performed directly 
> between a client application and the blob store.
> This feature is described in more detail [on the Oak 
> wiki|https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access].
> This feature is similar in functionality to OAK-6575.  It adds the capability 
> to also upload directly to storage via preauthorized URLs in addition to 
> downloading directly from storage via preauthorized URLs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to