[ https://issues.apache.org/jira/browse/OAK-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569802#comment-16569802 ]
Alexander Klimetschek edited comment on OAK-7569 at 8/6/18 6:56 AM: -------------------------------------------------------------------- h4. Problem Unfortunately, we could not commit the final agreed change this week, because [~mattvryan] found a blocking issue: {{BinaryDownload.getURI()}} always returns null because the regular code path {{node.getProperty().getBinary()}} does not pass the {{BlobAccessProvider}} instance through to the {{ValueFactoryImpl}} because it uses a [static method of ValueFactoryImpl in PropertyImpl.getValue()|https://github.com/apache/jackrabbit-oak/blob/905c9bd0f716778a035a708bbdb8de634f464e66/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/session/PropertyImpl.java#L252-L253]. Looking further, there are actually many static method usages of {{ValueFactoryImpl}} today, that all need to ensure a {{BlobAccessProvider}} is available. [~mattvryan] and me think there are 2 solutions: # fix the static ValueFactoryImpl issue as much as possible (but ALL cases seem to be impossible to fix without a major Oak refactoring) # go back to the original API design (2a) or a slight variation thereof (2b) as a feasible compromise. More details below. h4. Details The unit tests so far had a shortcut where it reused the {{Binary}} used for writing, hence no one saw the issue earlier. This has been fixed in the latest changes for the integration tests: [https://github.com/mattvryan/jackrabbit-oak/pull/12] (also includes a lot of improvements around the test framework itself). This issue, that did not exist initially, and was tested, was introduced as a result of the refactorings during the review - which moved the API to a ValueFactory extension. At the same time the unit tests changed a bit, accidentally removing the important test case. It is possible to fix this particular Value access case, since {{PropertyImpl}} already has access to the per-session {{ValueFactoryImpl}} instance, and it just needs to replace all the static usages left by leveraging {{getValueFactory()}}. Same for some other places in oak-jcr and oak-core that currently employ a mix of instance and static method access. [~mattvryan] did an attempt at this here: [https://github.com/mattvryan/jackrabbit-oak/commit/e022f846c0ff113c95910d26584568004beacb66] Ideally, to ensure the {{blobAccessProvider}} reference in the {{ValueFactoryImpl}} is always set to the reference from the whiteboard that the {{SessionContext}} has, one would have to eliminate the static methods on {{ValueFactoryImpl}} and replace by instance methods. The class isn't doing proper encapsulation at this point, which is visible by the fact that {{NamePathMapper}}, an implementation detail the ValueFactoryImpl relies on, is actually all over the Oak code base. *However, there are two static {{ValueFactoryImpl.createValue()}} methods retrieving a PropertyState and PropertyValue argument respectively (+ NamePathMapper) that are used in different places in oak, including oak-security-spi.* These are completely unrelated to the binary change. They don't have any immediate access to the {{SessionContext}} (which provides both the {{ValueFactoryImpl}} and {{BlobAcccessProvider}} instance). They all get a {{NamePathMapper}} passed in, which is essentially the same situation that a component reference must be available for the {{ValueFactoryImpl}} methods to do their work, but if you follow the call hierarchy it literally explodes. Meaning a lot of files all over Oak would have to be changed to actually pass through the {{ValueFactoryImpl}} instance instead of just {{NamePathMapper}}. But even then, in many places it does not even get any access to the real {{NamePathMapper}} which {{SessionContext}} implements, but a {{NamePathMapper.DEFAULT}} is used (which apparently works for some of these cases?). It might be that some components were never designed to get access to the {{SessionContext}}, at least it seems it would be a massive refactoring to ensure that everywhere. h4. Unsolved Binary cases How does that affect the direct binary access feature? One example is [AuthorizablePropertiesImpl|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/security/user/AuthorizablePropertiesImpl.java#L108]. This corresponds to the code path of {{Authorizable.getProperty("foo").getValue().getBinary()}}, which would never be able to return anything from {{getURI()}}. Note that it will implement {{BinaryDownload}}, and the API contract is that {{getURI()}} shall return null if the feature is not available, but not if the client used the "wrong" way of retrieving a {{Binary}}. Other cases are hard to reason about: it's unclear where exactly they could end up being used by a JCR client. We tried to analyze it but I am not sure one can make guarantees... and since the static methods aren't property-type specific, one cannot exclude binary properties. Furthermore we have to consider future usage of these static methods (unless they become deprecated or removed). FWIW, this is the reason in our original patch we did not extended the {{ValueFactory}} at all but used an extension interface: {{BinaryAccessProvider}} on {{Session}}, as then access to the {{SessionContext}} and anything else is trivial. The idea of the {{ValueFactory}} as creator of values sounds good in theory, but the implementation in Oak using static methods creates values "out of thin air" with no access to other components. Except for NamePathMapper, which apparently was there since the beginning of Oak and hence was incorporated in all the code over time. Also note that this only applies to downloads (BinaryDownload.getURI()). The upload ({{JackrabbitValueFactory.completeBinaryUpload()}}) is never affected, because the JCR client code will retrieve the {{JackrabbitValueFactory}} instance from {{Session.getValueFactory()}}, which will always be the "good" instance. h4. Options [~mattvryan] and me identified these two options at this point. Given the huge effort of getting rid of the static {{ValueFactoryImpl.createValue()}} methods completely (a patch touching all of Oak), we excluded that one as a viable option - feedback welcome. # *Fix {{ValueFactoryImpl}} as much as possible* and ignore the cases where the static methods are still used and "hope" they never play a role for client code wanting to access {{BinaryDownload}}. ** Pro: no API change, can work short term. ** Con: unknown edge cases are not covered. # *Change the API back to the {{BinaryAccessProvider}} design*, as an extension of {{Session}}. ** Pro: no implementation refactoring needed, changes limited to a few places ** Con: Oak team preferred the other API approach; the API was already released as jackrabbit-api 2.17.5. while there are no downstream clients yet, reverting it would be difficult (major package version update or removing release artifact from maven central, ... see below) There are two variants of 2: * 2a) Go fully back to a {{BinaryAccessProvider}} with both upload and download in one interface. But that would mean removing the new interface {{JackrabbitValueFactory}} again (or it's methods). Just deprecating them and have them linger around would be pretty awful for jackrabbit-api, which should be "polished". That would result in a major version change of {{org.apache.jackrabbit.api}} ([2.5.0 to 3.0.0|https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-api/src/main/java/org/apache/jackrabbit/api/package-info.java]), which would break all existing JCR client code (in an OSGi environment). Otherwise, given there can be no real usage of this API as nothing released implements it yet, one could think of reverting the 2.17.5 release, removing it from maven-central and just do a clean new start from 2.17.4 with whatever API we want to add. * 2b) Introduce {{BinaryAccessProvider}} or {{BinaryDownloadProvider}} only for download, i.e. having a method {{getDownloadURI(Binary, BinaryDownloadOptions)}}. Then we would only have to remove {{BinaryDownload}}, but that's not a problem because it's in the new package {{org.apache.jackrabbit.api.binary}} for which we could freely jump from 1.0.0 to 2.0.0 export version (unlike {{org.apache.jackrabbit.api}} in 2a., no one is using that package yet). h4. My take * Option 1: I would feel a bit uncomfortable with such a "fingers crossed" approach, where we don't know the edge cases ;) * Option 2a: a single interface for upload (write) and download (read) would arguably the more intuitive approach, if we can't have both ValueFactory for write and Binary extension for read. But I don't know if the API revert is possible. * Option 2b: Very straightforward, but somewhat funky API in the end. Maybe it doesn't matter with good documentation. Thoughts? was (Author: alexander.klimetschek): h4. Problem Unfortunately, we could not commit the final agreed change this week, because [~mattvryan] found a blocking issue: {{BinaryDownload.getURI()}} always returns null because the regular code path {{node.getProperty().getBinary()}} does not pass the {{BlobAccessProvider}} instance through to the {{ValueFactoryImpl}} because it uses a [static method of ValueFactoryImpl in PropertyImpl.getValue()|https://github.com/apache/jackrabbit-oak/blob/905c9bd0f716778a035a708bbdb8de634f464e66/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/session/PropertyImpl.java#L252-L253]. Looking further, there are actually many static method usages of {{ValueFactoryImpl}} today, that all need to ensure a {{BlobAccessProvider}} is available. [~mattvryan] and me think there are 2 solutions: # fix the static ValueFactoryImpl issue as much as possible (but ALL cases seem to be impossible to fix without a major Oak refactoring) # go back to the original API design (2a) or a slight variation thereof (2b) as a feasible compromise. More details below. h4. Details The unit tests so far had a shortcut where it reused the {{Binary}} used for writing, hence no one saw the issue earlier. This has been fixed in the latest changes for the integration tests: [https://github.com/mattvryan/jackrabbit-oak/pull/12] (also includes a lot of improvements around the test framework itself). This issue, that did not exist initially, and was tested, was introduced as a result of the refactorings during the review - which moved the API to a ValueFactory extension. At the same time the unit tests changed a bit, accidentally removing the important test case. It is possible to fix this particular Value access case, since {{PropertyImpl}} already has access to the per-session {{ValueFactoryImpl}} instance, and it just needs to replace all the static usages left by leveraging {{getValueFactory()}}. Same for some other places in oak-jcr and oak-core that currently employ a mix of instance and static method access. [~mattvryan] did an attempt at this here: [https://github.com/mattvryan/jackrabbit-oak/commit/e022f846c0ff113c95910d26584568004beacb66] Ideally, to ensure the {{blobAccessProvider}} reference in the {{ValueFactoryImpl}} is always set to the reference from the whiteboard that the {{SessionContext}} has, one would have to eliminate the static methods on {{ValueFactoryImpl}} and replace by instance methods. The class isn't doing proper encapsulation at this point, which is visible by the fact that {{NamePathMapper}}, an implementation detail the ValueFactoryImpl relies on, is actually all over the Oak code base. *However, there are two static {{ValueFactoryImpl.createValue()}} methods retrieving a PropertyState and PropertyValue argument respectively (+ NamePathMapper) that are used in different places in oak, including oak-security-spi.* These are completely unrelated to the binary change. They don't have any immediate access to the {{SessionContext}} (which provides both the {{ValueFactoryImpl}} and {{BlobAcccessProvider}} instance). They all get a {{NamePathMapper}} passed in, which is essentially the same situation that a component reference must be available for the {{ValueFactoryImpl}} methods to do their work, but if you follow the call hierarchy it literally explodes. Meaning a lot of files all over Oak would have to be changed to actually pass through the {{ValueFactoryImpl}} instance instead of just {{NamePathMapper}}. But even then, in many places it does not even get any access to the real {{NamePathMapper}} which {{SessionContext}} implements, but a {{NamePathMapper.DEFAULT}} is used (which apparently works for some of these cases?). It might be that some components were never designed to get access to the {{SessionContext}}, at least it seems it would be a massive refactoring to ensure that everywhere. h4. Unsolved Binary cases How does that affect the direct binary access feature? One example is [AuthorizablePropertiesImpl|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/security/user/AuthorizablePropertiesImpl.java#L108]. This corresponds to the code path of {{Authorizable.getProperty("foo").getValue().getBinary()}}, which would never be able to return anything from {{getURI()}}. Note that it will implement {{BinaryDownload}}, and the API contract is that {{getURI()}} shall return null if the feature is not available, but not if the client used the "wrong" way of retrieving a {{Binary}}. Other cases are hard to reason about: it's unclear where exactly they could end up being used by a JCR client. We tried to analyze it but I am not sure one can make guarantees... and since the static methods aren't property-type specific, one cannot exclude binary properties. Furthermore we have to consider future usage of these static methods (unless they become deprecated or removed). FWIW, this is the reason in our original patch we did not extended the {{ValueFactory}} at all but used an extension interface ({{BinaryAccessProvider}} on {{Session}}, as then access to the {{SessionContext}} and anything else is trivial. The idea of the {{ValueFactory}} as creator of values sounds good in theory, but the implementation in Oak using static methods creates values "out of thin air" with no access to other components. Also note that this only applies to downloads (BinaryDownload.getURI()). The upload ({{JackrabbitValueFactory.completeBinaryUpload()}}) is never affected, because the JCR client code will retrieve the {{JackrabbitValueFactory}} instance from {{Session.getValueFactory()}}, which will always be the "good" instance. h4. Options [~mattvryan] and me identified these two options at this point. Given the huge effort of getting rid of the static {{ValueFactoryImpl.createValue()}} methods completely (a patch touching all of Oak), we excluded that one as a viable option - feedback welcome. # *Fix {{ValueFactoryImpl}} as much as possible* and ignore the cases where the static methods are still used and "hope" they never play a role for client code wanting to access {{BinaryDownload}}. ** Pro: no API change, can work short term. ** Con: unknown edge cases are not covered. # *Change the API back to the {{BinaryAccessProvider}} design*, as an extension of {{Session}}. ** Pro: no implementation refactoring needed, changes limited to a few places ** Con: Oak team preferred the other API approach; the API was already released as jackrabbit-api 2.17.5. while there are no downstream clients yet, reverting it would be difficult (major package version update or removing release artifact from maven central, ... see below) There are two variants of 2: * 2a) Go fully back to a {{BinaryAccessProvider}} with both upload and download in one interface. But that would mean removing the new interface {{JackrabbitValueFactory}} again (or it's methods). Just deprecating them and have them linger around would be pretty awful for jackrabbit-api, which should be "polished". That would result in a major version change of {{org.apache.jackrabbit.api}} ([2.5.0 to 3.0.0|https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-api/src/main/java/org/apache/jackrabbit/api/package-info.java]), which would break all existing JCR client code (in an OSGi environment). Otherwise, given there can be no real usage of this API as nothing released implements it yet, one could think of reverting the 2.17.5 release, removing it from maven-central and just do a clean new start from 2.17.4 with whatever API we want to add. * 2b) Introduce {{BinaryAccessProvider}} or {{BinaryDownloadProvider}} only for download, i.e. having a method {{getDownloadURI(Binary, BinaryDownloadOptions)}}. Then we would only have to remove {{BinaryDownload}}, but that's not a problem because it's in the new package {{org.apache.jackrabbit.api.binary}} for which we could freely jump from 1.0.0 to 2.0.0 export version (unlike {{org.apache.jackrabbit.api}} in 2a., no one is using that package yet). h4. My take * Option 1: I would feel a bit uncomfortable with such a "fingers crossed" approach, where we don't know the edge cases ;) * Option 2a: a single interface for upload (write) and download (read) would arguably the more intuitive approach, if we can't have both ValueFactory for write and Binary extension for read. But I don't know if the API revert is possible. * Option 2b: Very straightforward, but somewhat funky API in the end. Maybe it doesn't matter with good documentation. Thoughts? > Direct Binary Access > -------------------- > > Key: OAK-7569 > URL: https://issues.apache.org/jira/browse/OAK-7569 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: api, blob-cloud, blob-cloud-azure, blob-plugins > Reporter: Matt Ryan > Assignee: Matt Ryan > Priority: Major > Attachments: OAK-7569-api-javadoc-improvements.patch > > > Provide a direct binary access feature to Oak which allows an authenticated > client to create or download blobs directly to/from the blob store, assuming > the authenticated user has appropriate permission to do so. The primary value > of this feature is that the I/O of transferring large binary files to or from > the blob store can be offloaded entirely from Oak and performed directly > between a client application and the blob store. > This feature is described in more detail [on the Oak > wiki|https://wiki.apache.org/jackrabbit/Direct%20Binary%20Access]. > This feature is similar in functionality to OAK-6575. It adds the capability > to also upload directly to storage via preauthorized URLs in addition to > downloading directly from storage via preauthorized URLs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)