Re: About JENA-2339 - security related

Martynas Jusevičius Mon, 08 Aug 2022 10:07:47 -0700

On Mon, 8 Aug 2022 at 17.21, Vilnis Termanis
<[email protected]> wrote:


> On Sat, 30 Jul 2022 at 21:14, Martynas Jusevičius
> <[email protected]> wrote:
> >
> > On Fri, Jul 29, 2022 at 7:27 PM Vilnis Termanis
> > <[email protected]> wrote:
> > >
> > > (inline)
> > >
> > > On Fri, 29 Jul 2022 at 07:56, Martynas Jusevičius
> > > <[email protected]> wrote:
> > > >
> > > > “Sets of triples” — aren’t these datasets?
> > > >
> > > > Couldn’t this use case be addressed by maintaining per-user
> datasets? Not
> > > > sure if Fuseki can create datasets on the fly, but this seems like a
> much
> > > > simpler feature to implement compared to a whole new ACL mechanism.
> > >
> > > The idea is, that if you had these "sets of triples" A-Z, one user
> > > might be allowed to see A-M and another C-Q. With per-user datasets
> > > you'd have to duplicate data to achieve that. And, when the ACL
> > > changes, you'd have to copy/move triples from one dataset to another.
> > > (Or am I missing a nuance to your proposal? Do you mean dynamically
> > > creating a new dataset which references graphs from another dataset?)
> >
> > No, not missing :)
> >
> > I mean it sounds like a useful feature, and we could probably find use
> > for it ourselves.
> >
> > But if the ACL is graph-scoped, can't it employ an existing ontology
> > such as WAC? [1]
> > It would be eating your own dogfood, and of course it being RDF you
> > could query and update your ACL using SPARQL.That would probably
> > require a meta-dataset containing ACL data for each secured dataset.
>
> It definitely could (and in fact, we are doing pretty much what you
> describe right now).
> However, thinking in general terms - aren't there two levels to such
> an ACL solution:
>
> 1) ACL is treated as completely external to Jena/Fuseki: Something
> else is responsible for providing the "allow list" of graphs. (And:
> Ideally there is no hard requirement to require a Java integration to
> use the feature.)


This option has the advantage of not being specific to Fuseki, meaning thay
it can work with any triplestore. The access check is encapsulated as a
SPARQL query and can be easily reused accross frameworks.


> 2) ACL is enabled by storing rules in a specific graph in a Jena
> dataset (and there I agree WAC seems very sensible - as you've
> linked).
>
> I'm querying about (1) where Jena/Fuseki is not necessarily the centre
> of the picture, but part of multiple components.
>
> >
> > As it happens we have an authorization request filter for Jersey that
> > checks WAC access using SPARQL:
> >
> https://github.com/AtomGraph/LinkedDataHub/blob/master/src/main/java/com/atomgraph/linkeddatahub/server/filter/request/AuthorizationFilter.java
> > The SPARQL query:
> >
> https://github.com/AtomGraph/LinkedDataHub/blob/master/src/main/webapp/WEB-INF/web.xml#L25
> >
> > [1] https://www.w3.org/wiki/WebAccessControl
> >
> > >
> > > >
> > > > On Thu, 28 Jul 2022 at 22.51, Vilnis Termanis
> > > > <[email protected]> wrote:
> > > >
> > > > > Hi Andy & Jena development community,
> > > > >
> > > > > (Answers inline - apologies if I repeat myself)
> > > > >
> > > > > FYI - Our aim is to enable end-users to make SPARQL queries whilst
> > > > > respecting visibility restrictions.
> > > > > I.e. users (indirectly) add sets of related triples to a dataset
> and
> > > > > they can choose who has visibility (beyond themselves) over these,
> > > > > either: Nobody, Everyone or a chosen set (which can be updated).
> Note
> > > > > that this restriction is not by a specific subject or predicate.
> > > > > (Although the sets of triples do have relationships - not all of
> them
> > > > > are known in advance.)
> > > > >
> > > > > On Thu, 28 Jul 2022 at 10:43, Andy Seaborne <[email protected]>
> wrote:
> > > > > >
> > > > > > JENA-2339
> > > > > > PR#1441
> > > > > >
> > > > >
> https://github.com/vtermanis/jena/blob/dynamic-graph-restriction-extension/MOVE_ME_DynamicACL_notes.md
> > > > > >
> > > > > > tl;dr:
> > > > > >
> > > > > > It is a different role for Fuseki.
> > > > > >
> > > > > > Fuseki execute the security but the setup and control is from a
> trusted
> > > > > > external server on the request execution path.
> > > > > >
> > > > > > It assumes certain deployment environments to be safe.
> > > > >
> > > > > FYI - In our case this means that we have a "make SPARQL query" API
> > > > > call. When received, the applicable user (our domain) is known
> and, in
> > > > > the proposed PR, we can prepend the set of allowed graphs to the
> query
> > > > > (which have been looked up prior to query execution, externally).
> The
> > > > > end user has NO direct access to Fuseki itself.
> > > > >
> > > > > >
> > > > > > My feeling is that we should make Fuseki configurable enough so
> that a
> > > > > > downstream 3rd party can add their security solution that is
> suitable
> > > > > > for their environment. But we should not incorporate a particular
> > > > > > security solution that relies on the deployment environment.
> > > > > >
> > > > > > ----
> > > > > >
> > > > > > I've asked for more information about the claim on a performance
> > > > > > motivator and some other background information.
> > > > > >
> > > > > > The usage patterns are not yet clear. The data is described as
> "a one
> > > > > > graph per handful of subjects and their properties" and "100s of
> > > > > > graphs". What the queries are is unstated.
> > > > >
> > > > > Right now, each graph has in the range of 300-500 triples (though
> the
> > > > > amount depends on how much additional/domain-specific metadata
> > > > > end-users choose to add) and the scale of deployed Fuseki datasets
> > > > > range from having a few to ~6k graphs.
> > > > > Since we'd like to allow end-users to run **any** queries they wish
> > > > > (we enforce query timeouts), it's difficult to give concrete
> examples.
> > > > > I can however say that TDB unionDefaultGraph mode is enabled (i.e.
> > > > > most end-users won't choose to explicitly target a specific graph)
> and
> > > > > that one of our representative "search" queries (which combines
> > > > > GeoSPARQL + multiple explicit property matching across multiple
> > > > > different subjects in a UNION + subsequent collection of mandatory
> &
> > > > > optional fields) is between 20-40% faster than the current custom
> > > > > solution.
> > > > > (Note that we have also tried query re-writing to insert FROM/FROM
> > > > > NAMED clauses - and that is very slow in comparison, presumably to
> the
> > > > > higher level filtering involved, unlike the quad filter herein.)
> > > > >
> > > > > >
> > > > > > There is no characterisation of the queries being made. If we are
> > > > > > talking about overheads, the cases of a few big queries and many
> small
> > > > > > queries are different.
> > > > >
> > > > > (pasted from JENA-2339 ticket) - using a "SELECT {} 1" query, and
> > > > > adding a certain set of graphs makes the queries on my laptop take:
> > > > > ~600 graphs ~115ms
> > > > > ~1500 graphs ~162ms
> > > > > ~3k graphs ~240ms
> > > > > ~6k graphs ~400ms
> > > > >
> > > > > >
> > > > > > The scale looks small (less than a million triples of triples -
> > > > > > approximating as 100 graphs * 1000 triples). That makes the
> point about
> > > > > > access to TDB hooks a bit redundant.
> > > > >
> > > > > The dataset I've tested this with has ~1.8M triples. That's not to
> say
> > > > > this is the scale we're hoping to satisfy - that's the just what I
> > > > > tested with first. By redundant, do you mean an alternative
> approach
> > > > > should be used for this scale?
> > > > >
> > > > > >
> > > > > >
> > > > > > There is are distinguished users. A request from one of these
> users
> > > > > > causes the set of visible graphs to be read from a comment at
> the start
> > > > > > of the query text in the request.
> > > > > >
> > > > > > The use of large numbers of small named graphs to manage security
> > > > > > settings looks to me like triple-level security.  I have already
> > > > > > mentioned work "FMod_ABAC": (£job related) awhile back
> (2/Jan/2022). It
> > > > > > is triple level attribute-based security.
> > > > >
> > > > > It could well be that I'm seeing the wrong solution for the feature
> > > > > we're trying to support (that's the other reason for reaching out
> to
> > > > > the community. The reason (rightly or wrongly) to model this as a
> set
> > > > > of graphs is: Each set of triples to be restricted are related, but
> > > > > span multiple subjects and could also relate to other subjects in
> > > > > other sets (as well as externally).
> > > > > Hence I couldn't see how e.g. Jena Permissions could be applied
> here:
> > > > > When you're provided with a single triple to check - you would
> have to
> > > > > understand what type subject it is and how it relates to the "top
> > > > > level" subject to which the ACL applies. Bundling everything into a
> > > > > graph seemed like viable option.
> > > > >
> > > > > >
> > > > > > Concern 1:
> > > > > >
> > > > > > This by passes Fuseki-provided security and puts the control
> function
> > > > > > outside the Fuseki server in a separate server that is not part
> of Jena.
> > > > > > It will only be secure if deployed in a constrained network
> environment.
> > > > > >
> > > > > > This is not secure except when run in a certain way and,
> personally, I
> > > > > > don't want to have to deal with a CVE because of that. CVE
> handling is
> > > > > > time consuming.
> > > > > >
> > > > > > I don't see why it is using jena-access (the named graph security
> > > > > > feature) except for the filtering on TDB. It is creating a
> dynamic
> > > > > > dataset for the query.
> > > > >
> > > > > You're right - it's only as secure as the
> middleware/proxy/whatever in
> > > > > front of it which supplies the ACL. (This was never intended to be
> > > > > used/exposed to end-users directly.)
> > > > > The purpose of extending jena-access (instead of immediately
> writing
> > > > > it as a separate module) was to illustrate with minimal code
> changes
> > > > > (+ extension of existing tests) what it could look like, for
> > > > > discussion. (The quad filtering / performance aspect would be the
> > > > > same, regardless of location, I presume.)
> > > > >
> > > > > >
> > > > > > Concern 2: How does update fit into the picture? (GSP is not
> supported).
> > > > >
> > > > > I thought that, since GSP operations target a single graph, there
> is
> > > > > no need to extend support to it since it's already possible to
> > > > > restrict visibility (with the graph query parameter). Am I missing
> > > > > something?
> > > > >
> > > > > >
> > > > > > Concern 3: It looks like a specific solution for a specific
> scenario.
> > > > > > Will it get uptake by the wide Jena user community?
> > > > >
> > > > > It's definitely specific. My thinking was that, if a subset of this
> > > > > were deemed useful, then it'd be better to exist as part of the
> core
> > > > > offering as opposed to us just bolting it on ourselves (at my job).
> > > > > But, if that's not the case - fair enough.
> > > > >
> > > > > >
> > > > > > Concern 4: Is there long-term support and maintenance for the
> feature?
> > > > > > (e.g. 5y+)
> > > > > > How do we respond to users@ message about it? Is it
> experimental code or
> > > > > > has it been used for real? Is the feature set stable?
> > > > >
> > > > > My understanding is that jena-access is classed as stable (we're
> using
> > > > > it for something else already in production) and thus, since this
> > > > > merely produces a SecurityContext with a larger set of graphs,
> would
> > > > > theoretically be no less stable.
> > > > >
> > > > > >
> > > > > >
> > > > > > Opinion: it is not unreasonable to provide support for this kind
> of
> > > > > > customization of Fuseki.
> > > > > >
> > > > > > An extension can then provide whatever security is needed for the
> > > > > > situation and it is the Fuseki user/operator making the
> decisions about
> > > > > > what is acceptable security and what isn't.
> > > > > >
> > > > > > Fuseki has ways to add custom processors and this seems the way
> to
> > > > > > provide an alternative way to make queries.
> > > > > >
> > > > > > Putting it in the distribution codebase is a big step for the
> project.
> > > > > > At the very least, it needs to be mature and likely to be used.
> > > > >
> > > > > We wouldn't be reaching out if we weren't likely to want to use
> such a
> > > > > feature. All these concerns/questions/suggestions are exactly what
> we
> > > > > were hoping for. If I can provide any more context/tests/samples,
> let
> > > > > me know.
> > > > > (I completely get the concerns about diluting a known security
> feature
> > > > > and have no issue with something like this being a separate
> > > > > component.)
> > > > >
> > > > > >
> > > > > > Background: Currently jena-access is in Fuseki main. It is not
> optional
> > > > > > because it predates Fuseki modules.
> > > > > >
> > > > > >      Andy
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Vilnis Termanis
> > > > > Technical Specialist
> > > > >
> > > > > e | [email protected]
> > > > > www.iotics.com
> > > > >
> > >
> > >
> > >
> > > --
> > > Vilnis Termanis
> > > Technical Specialist
> > >
> > > e | [email protected]
> > > www.iotics.com
> > >
> > > The information contained in this email is strictly confidential and
> > > intended only for the parties noted. If this email was not intended
> > > for your use, please contact Iotics. For more on our Privacy Policy
> > > please visit https://www.iotics.com/legal/
>
>
>
> --
> Vilnis Termanis
> Technical Specialist
>
> e | [email protected]
> www.iotics.com
>
> The information contained in this email is strictly confidential and
> intended only for the parties noted. If this email was not intended
> for your use, please contact Iotics. For more on our Privacy Policy
> please visit https://www.iotics.com/legal/
>

Re: About JENA-2339 - security related

Reply via email to