Re: API proposal for - Expose URL for Blob source (OAK-1963)

Chetan Mehrotra Thu, 05 May 2016 03:38:55 -0700

> This proposal introduces a huge leak of abstractions and has deep security
implications.

I understand the leak of abstractions concern. However would like to
understand the security concern bit more.

One way I can think of that it can cause security concern is you have some
malicious code running in same jvm which can then do bad things with the
file handle. Do note that the File handle would not get exposed via any
remoting api we currently support. Now in this case if malicious code is
already running in same jvm then security is breached and code can anyway
make use of reflection to access internal details.

So if there is any other possible security concern then would like to
discuss.

Coming to usecases

Usecase A - Image rendition generation
-----------------------------------------------------

We have some bigger deployments where lots of images gets uploaded to the
repository and there are some conversions (rendition generation) which are
performed by OS specific native executables. Such programs work directly on
file handle. Without this change currently we need to first spool the file
content into some temporary location and then pass that to the other
program. This add unnecessary overhead and something which can be avoided
in case there is a FileDataStore being used where we can provide a direct
access to the file

Usecase B - Efficient replication across regions in S3
----------------------------------------------------------------------

This for AEM based setup which is running on Oak with S3DataStore. There we
have global deployment where author instance is running in 1 region and
binary content is to be distributed to publish instances running in
different regions. The DataStore size is huge say 100TB and for efficient
operation we need to use Binary less replication. In most cases only a very
small subset of binary content would need to be present in other
regions. Current
way (via shared DataStore) to support that would involve synchronizing the
S3 bucket across all such regions which would increase the storage cost
considerable.

Instead of that plan is to replicate the specific assets via s3 copy
operation. This would ensure that big assets can be copied efficiently at
S3 level and that would require direct access to the S3 object.

Again in all such cases one can always resort to current level support i.e.
copy over all the content via inputstream into some temporary store and
then use that. But that would add considerable overhead when assets are of
100MB sizes or more. So the approach proposed would allow client code to
this efficiently depending on the underlying storage capability

> To me sounds like breaching the JCR and NodeState layers to directly
> manipulate NodeStore binaries (from the DataStore), e.g. to perform smart
> replication across different instances, but imho the right way to address
> that is extending one of the current DataStore implementations or create a
> new one.

The original proposed approach in OAK-1963 was like that i.e. introduce
this access method on BlobStore which works on reference. But in that case
client code would need to deal with BlobStore API. In either case access to
actual binary storage data would be required

Chetan Mehrotra

On Thu, May 5, 2016 at 2:49 PM, Tommaso Teofili <tommaso.teof...@gmail.com>
wrote:

> +1 to Francesco's concerns, exposing the location of a binary at the
> application level doesn't sound good from a security perspective.
> To me sounds like breaching the JCR and NodeState layers to directly
> manipulate NodeStore binaries (from the DataStore), e.g. to perform smart
> replication across different instances, but imho the right way to address
> that is extending one of the current DataStore implementations or create a
> new one.
> I am also concerned that this Adaptable pattern would open room for other
> such hacks into the stack.
>
> My 2 cents,
> Tommaso
>
>
> Il giorno gio 5 mag 2016 alle ore 11:00 Francesco Mari <
> mari.france...@gmail.com> ha scritto:
>
> > This proposal introduces a huge leak of abstractions and has deep
> security
> > implications.
> >
> > I guess that the reason for this proposal is that some users of Oak would
> > like to perform some operations on binaries in a more performant way by
> > leveraging the way those binaries are stored. If this is the case, I
> > suggest those users to evaluate an applicative solution implemented on
> top
> > of the JCR API.
> >
> > If a user needs to store some important binary data (files, images, etc.)
> > in an S3 bucket or on the file system for performance reasons, this
> > shouldn't affect how Oak handles blobs internally. If some assets are of
> > special interest for the user, then the user should bypass Oak and take
> > care of the storage of those assets directly. Oak can be used to store
> > *references* to those assets, that can be used in user code to manipulate
> > the assets in his own business logic.
> >
> > If the scenario I outlined is not what inspired this proposal, I would
> like
> > to know more about the reasons why this proposal was brought up. Which
> > problems are we going to solve with this API? Is there a more concrete
> use
> > case that we can use as a driving example?
> >
> > 2016-05-05 10:06 GMT+02:00 Davide Giannella <dav...@apache.org>:
> >
> > > On 04/05/2016 17:37, Ian Boston wrote:
> > > > Hi,
> > > > If the File or URL is writable, will writing to the location cause
> > issues
> > > > for Oak ?
> > > > IIRC some Oak DS implementations use a digest of the content to
> > determine
> > > > the location in the DS, so changing the content via Oak will change
> the
> > > > location, but changing the content via the File or URL wont. If I
> > didn't
> > > > remember correctly, then ignore the concern.  Fully supportive of the
> > > > approach, as a consumer of Oak. The locations will certainly probably
> > > leak
> > > > outside the context of an Oak session so the API contract should make
> > it
> > > > clear that the code using a direct location needs to behave
> > responsibly.
> > > >
> > >
> > > It's a reasonable concern and I'm not in the details of the
> > > implementation. It's worth to keep in mind though and remember if we
> > > want to adapt to URL or File that maybe we'll have to come up with some
> > > sort of read-only version of such.
> > >
> > > For the File class, IIRC, we could force/use the setReadOnly(),
> > > setWritable() methods. I remember those to be quite expensive in time
> > > though.
> > >
> > > Davide
> > >
> > >
> > >
> >
>

Re: API proposal for - Expose URL for Blob source (OAK-1963)

Reply via email to