Re: API proposal for - Expose URL for Blob source (OAK-1963)

Francesco Mari Thu, 05 May 2016 04:09:42 -0700

The security concern is quite easy to explain: it's a bypass of our
security model. Imagine that, using a session with the appropriate
privileges, a user accesses a Blob and adapts it to a file handle, an S3
bucket or a URL. This code passes this reference to another piece of code
that modifies the data directly even if - in the same deployment - it
shouldn't be able to access the Blob instance to begin with.


In addition to that, I'm very concerned with the correctness of this
solution. In both the use cases you mentioned above, you assume that the
leaked reference is only used to read the data. The truth is that, once a
reference leaks, we can't be sure that we are the only agent managing the
data. We would have to program defensively because we are - as a matter of
fact - sharing the management of the data with an unspecified amount of
user code. I don't even know if it's possible to anticipate every single
thing that can go wrong.

In both the use case, the customer is coupling the data with the most
appropriate storage solution for his business case. In this case, customer
code - and not Oak - should be responsible for the management of that data.
Oak can still be used to store references to that data - paths on the file
system, the ID of the S3 bucket or the URI to the resource.

2016-05-05 12:38 GMT+02:00 Chetan Mehrotra <chetan.mehro...@gmail.com>:

> > This proposal introduces a huge leak of abstractions and has deep
> security
> implications.
>
> I understand the leak of abstractions concern. However would like to
> understand the security concern bit more.
>
> One way I can think of that it can cause security concern is you have some
> malicious code running in same jvm which can then do bad things with the
> file handle. Do note that the File handle would not get exposed via any
> remoting api we currently support. Now in this case if malicious code is
> already running in same jvm then security is breached and code can anyway
> make use of reflection to access internal details.
>
> So if there is any other possible security concern then would like to
> discuss.
>
> Coming to usecases
>
> Usecase A - Image rendition generation
> -----------------------------------------------------
>
> We have some bigger deployments where lots of images gets uploaded to the
> repository and there are some conversions (rendition generation) which are
> performed by OS specific native executables. Such programs work directly on
> file handle. Without this change currently we need to first spool the file
> content into some temporary location and then pass that to the other
> program. This add unnecessary overhead and something which can be avoided
> in case there is a FileDataStore being used where we can provide a direct
> access to the file
>
> Usecase B - Efficient replication across regions in S3
> ----------------------------------------------------------------------
>
> This for AEM based setup which is running on Oak with S3DataStore. There we
> have global deployment where author instance is running in 1 region and
> binary content is to be distributed to publish instances running in
> different regions. The DataStore size is huge say 100TB and for efficient
> operation we need to use Binary less replication. In most cases only a very
> small subset of binary content would need to be present in other
> regions. Current
> way (via shared DataStore) to support that would involve synchronizing the
> S3 bucket across all such regions which would increase the storage cost
> considerable.
>
> Instead of that plan is to replicate the specific assets via s3 copy
> operation. This would ensure that big assets can be copied efficiently at
> S3 level and that would require direct access to the S3 object.
>
> Again in all such cases one can always resort to current level support i.e.
> copy over all the content via inputstream into some temporary store and
> then use that. But that would add considerable overhead when assets are of
> 100MB sizes or more. So the approach proposed would allow client code to
> this efficiently depending on the underlying storage capability
>
> > To me sounds like breaching the JCR and NodeState layers to directly
> > manipulate NodeStore binaries (from the DataStore), e.g. to perform smart
> > replication across different instances, but imho the right way to address
> > that is extending one of the current DataStore implementations or create
> a
> > new one.
>
> The original proposed approach in OAK-1963 was like that i.e. introduce
> this access method on BlobStore which works on reference. But in that case
> client code would need to deal with BlobStore API. In either case access to
> actual binary storage data would be required
>
> Chetan Mehrotra
>
> On Thu, May 5, 2016 at 2:49 PM, Tommaso Teofili <tommaso.teof...@gmail.com
> >
> wrote:
>
> > +1 to Francesco's concerns, exposing the location of a binary at the
> > application level doesn't sound good from a security perspective.
> > To me sounds like breaching the JCR and NodeState layers to directly
> > manipulate NodeStore binaries (from the DataStore), e.g. to perform smart
> > replication across different instances, but imho the right way to address
> > that is extending one of the current DataStore implementations or create
> a
> > new one.
> > I am also concerned that this Adaptable pattern would open room for other
> > such hacks into the stack.
> >
> > My 2 cents,
> > Tommaso
> >
> >
> > Il giorno gio 5 mag 2016 alle ore 11:00 Francesco Mari <
> > mari.france...@gmail.com> ha scritto:
> >
> > > This proposal introduces a huge leak of abstractions and has deep
> > security
> > > implications.
> > >
> > > I guess that the reason for this proposal is that some users of Oak
> would
> > > like to perform some operations on binaries in a more performant way by
> > > leveraging the way those binaries are stored. If this is the case, I
> > > suggest those users to evaluate an applicative solution implemented on
> > top
> > > of the JCR API.
> > >
> > > If a user needs to store some important binary data (files, images,
> etc.)
> > > in an S3 bucket or on the file system for performance reasons, this
> > > shouldn't affect how Oak handles blobs internally. If some assets are
> of
> > > special interest for the user, then the user should bypass Oak and take
> > > care of the storage of those assets directly. Oak can be used to store
> > > *references* to those assets, that can be used in user code to
> manipulate
> > > the assets in his own business logic.
> > >
> > > If the scenario I outlined is not what inspired this proposal, I would
> > like
> > > to know more about the reasons why this proposal was brought up. Which
> > > problems are we going to solve with this API? Is there a more concrete
> > use
> > > case that we can use as a driving example?
> > >
> > > 2016-05-05 10:06 GMT+02:00 Davide Giannella <dav...@apache.org>:
> > >
> > > > On 04/05/2016 17:37, Ian Boston wrote:
> > > > > Hi,
> > > > > If the File or URL is writable, will writing to the location cause
> > > issues
> > > > > for Oak ?
> > > > > IIRC some Oak DS implementations use a digest of the content to
> > > determine
> > > > > the location in the DS, so changing the content via Oak will change
> > the
> > > > > location, but changing the content via the File or URL wont. If I
> > > didn't
> > > > > remember correctly, then ignore the concern.  Fully supportive of
> the
> > > > > approach, as a consumer of Oak. The locations will certainly
> probably
> > > > leak
> > > > > outside the context of an Oak session so the API contract should
> make
> > > it
> > > > > clear that the code using a direct location needs to behave
> > > responsibly.
> > > > >
> > > >
> > > > It's a reasonable concern and I'm not in the details of the
> > > > implementation. It's worth to keep in mind though and remember if we
> > > > want to adapt to URL or File that maybe we'll have to come up with
> some
> > > > sort of read-only version of such.
> > > >
> > > > For the File class, IIRC, we could force/use the setReadOnly(),
> > > > setWritable() methods. I remember those to be quite expensive in time
> > > > though.
> > > >
> > > > Davide
> > > >
> > > >
> > > >
> > >
> >
>

Re: API proposal for - Expose URL for Blob source (OAK-1963)

Reply via email to