> This proposal introduces a huge leak of abstractions and has deep security implications.
I understand the leak of abstractions concern. However would like to understand the security concern bit more. One way I can think of that it can cause security concern is you have some malicious code running in same jvm which can then do bad things with the file handle. Do note that the File handle would not get exposed via any remoting api we currently support. Now in this case if malicious code is already running in same jvm then security is breached and code can anyway make use of reflection to access internal details. So if there is any other possible security concern then would like to discuss. Coming to usecases Usecase A - Image rendition generation ----------------------------------------------------- We have some bigger deployments where lots of images gets uploaded to the repository and there are some conversions (rendition generation) which are performed by OS specific native executables. Such programs work directly on file handle. Without this change currently we need to first spool the file content into some temporary location and then pass that to the other program. This add unnecessary overhead and something which can be avoided in case there is a FileDataStore being used where we can provide a direct access to the file Usecase B - Efficient replication across regions in S3 ---------------------------------------------------------------------- This for AEM based setup which is running on Oak with S3DataStore. There we have global deployment where author instance is running in 1 region and binary content is to be distributed to publish instances running in different regions. The DataStore size is huge say 100TB and for efficient operation we need to use Binary less replication. In most cases only a very small subset of binary content would need to be present in other regions. Current way (via shared DataStore) to support that would involve synchronizing the S3 bucket across all such regions which would increase the storage cost considerable. Instead of that plan is to replicate the specific assets via s3 copy operation. This would ensure that big assets can be copied efficiently at S3 level and that would require direct access to the S3 object. Again in all such cases one can always resort to current level support i.e. copy over all the content via inputstream into some temporary store and then use that. But that would add considerable overhead when assets are of 100MB sizes or more. So the approach proposed would allow client code to this efficiently depending on the underlying storage capability > To me sounds like breaching the JCR and NodeState layers to directly > manipulate NodeStore binaries (from the DataStore), e.g. to perform smart > replication across different instances, but imho the right way to address > that is extending one of the current DataStore implementations or create a > new one. The original proposed approach in OAK-1963 was like that i.e. introduce this access method on BlobStore which works on reference. But in that case client code would need to deal with BlobStore API. In either case access to actual binary storage data would be required Chetan Mehrotra On Thu, May 5, 2016 at 2:49 PM, Tommaso Teofili <tommaso.teof...@gmail.com> wrote: > +1 to Francesco's concerns, exposing the location of a binary at the > application level doesn't sound good from a security perspective. > To me sounds like breaching the JCR and NodeState layers to directly > manipulate NodeStore binaries (from the DataStore), e.g. to perform smart > replication across different instances, but imho the right way to address > that is extending one of the current DataStore implementations or create a > new one. > I am also concerned that this Adaptable pattern would open room for other > such hacks into the stack. > > My 2 cents, > Tommaso > > > Il giorno gio 5 mag 2016 alle ore 11:00 Francesco Mari < > mari.france...@gmail.com> ha scritto: > > > This proposal introduces a huge leak of abstractions and has deep > security > > implications. > > > > I guess that the reason for this proposal is that some users of Oak would > > like to perform some operations on binaries in a more performant way by > > leveraging the way those binaries are stored. If this is the case, I > > suggest those users to evaluate an applicative solution implemented on > top > > of the JCR API. > > > > If a user needs to store some important binary data (files, images, etc.) > > in an S3 bucket or on the file system for performance reasons, this > > shouldn't affect how Oak handles blobs internally. If some assets are of > > special interest for the user, then the user should bypass Oak and take > > care of the storage of those assets directly. Oak can be used to store > > *references* to those assets, that can be used in user code to manipulate > > the assets in his own business logic. > > > > If the scenario I outlined is not what inspired this proposal, I would > like > > to know more about the reasons why this proposal was brought up. Which > > problems are we going to solve with this API? Is there a more concrete > use > > case that we can use as a driving example? > > > > 2016-05-05 10:06 GMT+02:00 Davide Giannella <dav...@apache.org>: > > > > > On 04/05/2016 17:37, Ian Boston wrote: > > > > Hi, > > > > If the File or URL is writable, will writing to the location cause > > issues > > > > for Oak ? > > > > IIRC some Oak DS implementations use a digest of the content to > > determine > > > > the location in the DS, so changing the content via Oak will change > the > > > > location, but changing the content via the File or URL wont. If I > > didn't > > > > remember correctly, then ignore the concern. Fully supportive of the > > > > approach, as a consumer of Oak. The locations will certainly probably > > > leak > > > > outside the context of an Oak session so the API contract should make > > it > > > > clear that the code using a direct location needs to behave > > responsibly. > > > > > > > > > > It's a reasonable concern and I'm not in the details of the > > > implementation. It's worth to keep in mind though and remember if we > > > want to adapt to URL or File that maybe we'll have to come up with some > > > sort of read-only version of such. > > > > > > For the File class, IIRC, we could force/use the setReadOnly(), > > > setWritable() methods. I remember those to be quite expensive in time > > > though. > > > > > > Davide > > > > > > > > > > > >