Re: About JENA-2339 - security related

Martynas Jusevičius Sat, 30 Jul 2022 13:14:25 -0700

On Fri, Jul 29, 2022 at 7:27 PM Vilnis Termanis
<vilnis.terma...@iotics.com.invalid> wrote:
>
> (inline)
>
> On Fri, 29 Jul 2022 at 07:56, Martynas Jusevičius
> <marty...@atomgraph.com> wrote:
> >
> > “Sets of triples” — aren’t these datasets?
> >
> > Couldn’t this use case be addressed by maintaining per-user datasets? Not
> > sure if Fuseki can create datasets on the fly, but this seems like a much
> > simpler feature to implement compared to a whole new ACL mechanism.
>
> The idea is, that if you had these "sets of triples" A-Z, one user
> might be allowed to see A-M and another C-Q. With per-user datasets
> you'd have to duplicate data to achieve that. And, when the ACL
> changes, you'd have to copy/move triples from one dataset to another.
> (Or am I missing a nuance to your proposal? Do you mean dynamically
> creating a new dataset which references graphs from another dataset?)


No, not missing :)

I mean it sounds like a useful feature, and we could probably find use
for it ourselves.

But if the ACL is graph-scoped, can't it employ an existing ontology
such as WAC? [1]
It would be eating your own dogfood, and of course it being RDF you
could query and update your ACL using SPARQL.That would probably
require a meta-dataset containing ACL data for each secured dataset.

As it happens we have an authorization request filter for Jersey that
checks WAC access using SPARQL:
https://github.com/AtomGraph/LinkedDataHub/blob/master/src/main/java/com/atomgraph/linkeddatahub/server/filter/request/AuthorizationFilter.java
The SPARQL query:
https://github.com/AtomGraph/LinkedDataHub/blob/master/src/main/webapp/WEB-INF/web.xml#L25

[1] https://www.w3.org/wiki/WebAccessControl

>
> >
> > On Thu, 28 Jul 2022 at 22.51, Vilnis Termanis
> > <vilnis.terma...@iotics.com.invalid> wrote:
> >
> > > Hi Andy & Jena development community,
> > >
> > > (Answers inline - apologies if I repeat myself)
> > >
> > > FYI - Our aim is to enable end-users to make SPARQL queries whilst
> > > respecting visibility restrictions.
> > > I.e. users (indirectly) add sets of related triples to a dataset and
> > > they can choose who has visibility (beyond themselves) over these,
> > > either: Nobody, Everyone or a chosen set (which can be updated). Note
> > > that this restriction is not by a specific subject or predicate.
> > > (Although the sets of triples do have relationships - not all of them
> > > are known in advance.)
> > >
> > > On Thu, 28 Jul 2022 at 10:43, Andy Seaborne <a...@apache.org> wrote:
> > > >
> > > > JENA-2339
> > > > PR#1441
> > > >
> > > https://github.com/vtermanis/jena/blob/dynamic-graph-restriction-extension/MOVE_ME_DynamicACL_notes.md
> > > >
> > > > tl;dr:
> > > >
> > > > It is a different role for Fuseki.
> > > >
> > > > Fuseki execute the security but the setup and control is from a trusted
> > > > external server on the request execution path.
> > > >
> > > > It assumes certain deployment environments to be safe.
> > >
> > > FYI - In our case this means that we have a "make SPARQL query" API
> > > call. When received, the applicable user (our domain) is known and, in
> > > the proposed PR, we can prepend the set of allowed graphs to the query
> > > (which have been looked up prior to query execution, externally). The
> > > end user has NO direct access to Fuseki itself.
> > >
> > > >
> > > > My feeling is that we should make Fuseki configurable enough so that a
> > > > downstream 3rd party can add their security solution that is suitable
> > > > for their environment. But we should not incorporate a particular
> > > > security solution that relies on the deployment environment.
> > > >
> > > > ----
> > > >
> > > > I've asked for more information about the claim on a performance
> > > > motivator and some other background information.
> > > >
> > > > The usage patterns are not yet clear. The data is described as "a one
> > > > graph per handful of subjects and their properties" and "100s of
> > > > graphs". What the queries are is unstated.
> > >
> > > Right now, each graph has in the range of 300-500 triples (though the
> > > amount depends on how much additional/domain-specific metadata
> > > end-users choose to add) and the scale of deployed Fuseki datasets
> > > range from having a few to ~6k graphs.
> > > Since we'd like to allow end-users to run **any** queries they wish
> > > (we enforce query timeouts), it's difficult to give concrete examples.
> > > I can however say that TDB unionDefaultGraph mode is enabled (i.e.
> > > most end-users won't choose to explicitly target a specific graph) and
> > > that one of our representative "search" queries (which combines
> > > GeoSPARQL + multiple explicit property matching across multiple
> > > different subjects in a UNION + subsequent collection of mandatory &
> > > optional fields) is between 20-40% faster than the current custom
> > > solution.
> > > (Note that we have also tried query re-writing to insert FROM/FROM
> > > NAMED clauses - and that is very slow in comparison, presumably to the
> > > higher level filtering involved, unlike the quad filter herein.)
> > >
> > > >
> > > > There is no characterisation of the queries being made. If we are
> > > > talking about overheads, the cases of a few big queries and many small
> > > > queries are different.
> > >
> > > (pasted from JENA-2339 ticket) - using a "SELECT {} 1" query, and
> > > adding a certain set of graphs makes the queries on my laptop take:
> > > ~600 graphs ~115ms
> > > ~1500 graphs ~162ms
> > > ~3k graphs ~240ms
> > > ~6k graphs ~400ms
> > >
> > > >
> > > > The scale looks small (less than a million triples of triples -
> > > > approximating as 100 graphs * 1000 triples). That makes the point about
> > > > access to TDB hooks a bit redundant.
> > >
> > > The dataset I've tested this with has ~1.8M triples. That's not to say
> > > this is the scale we're hoping to satisfy - that's the just what I
> > > tested with first. By redundant, do you mean an alternative approach
> > > should be used for this scale?
> > >
> > > >
> > > >
> > > > There is are distinguished users. A request from one of these users
> > > > causes the set of visible graphs to be read from a comment at the start
> > > > of the query text in the request.
> > > >
> > > > The use of large numbers of small named graphs to manage security
> > > > settings looks to me like triple-level security.  I have already
> > > > mentioned work "FMod_ABAC": (£job related) awhile back (2/Jan/2022). It
> > > > is triple level attribute-based security.
> > >
> > > It could well be that I'm seeing the wrong solution for the feature
> > > we're trying to support (that's the other reason for reaching out to
> > > the community. The reason (rightly or wrongly) to model this as a set
> > > of graphs is: Each set of triples to be restricted are related, but
> > > span multiple subjects and could also relate to other subjects in
> > > other sets (as well as externally).
> > > Hence I couldn't see how e.g. Jena Permissions could be applied here:
> > > When you're provided with a single triple to check - you would have to
> > > understand what type subject it is and how it relates to the "top
> > > level" subject to which the ACL applies. Bundling everything into a
> > > graph seemed like viable option.
> > >
> > > >
> > > > Concern 1:
> > > >
> > > > This by passes Fuseki-provided security and puts the control function
> > > > outside the Fuseki server in a separate server that is not part of Jena.
> > > > It will only be secure if deployed in a constrained network environment.
> > > >
> > > > This is not secure except when run in a certain way and, personally, I
> > > > don't want to have to deal with a CVE because of that. CVE handling is
> > > > time consuming.
> > > >
> > > > I don't see why it is using jena-access (the named graph security
> > > > feature) except for the filtering on TDB. It is creating a dynamic
> > > > dataset for the query.
> > >
> > > You're right - it's only as secure as the middleware/proxy/whatever in
> > > front of it which supplies the ACL. (This was never intended to be
> > > used/exposed to end-users directly.)
> > > The purpose of extending jena-access (instead of immediately writing
> > > it as a separate module) was to illustrate with minimal code changes
> > > (+ extension of existing tests) what it could look like, for
> > > discussion. (The quad filtering / performance aspect would be the
> > > same, regardless of location, I presume.)
> > >
> > > >
> > > > Concern 2: How does update fit into the picture? (GSP is not supported).
> > >
> > > I thought that, since GSP operations target a single graph, there is
> > > no need to extend support to it since it's already possible to
> > > restrict visibility (with the graph query parameter). Am I missing
> > > something?
> > >
> > > >
> > > > Concern 3: It looks like a specific solution for a specific scenario.
> > > > Will it get uptake by the wide Jena user community?
> > >
> > > It's definitely specific. My thinking was that, if a subset of this
> > > were deemed useful, then it'd be better to exist as part of the core
> > > offering as opposed to us just bolting it on ourselves (at my job).
> > > But, if that's not the case - fair enough.
> > >
> > > >
> > > > Concern 4: Is there long-term support and maintenance for the feature?
> > > > (e.g. 5y+)
> > > > How do we respond to users@ message about it? Is it experimental code or
> > > > has it been used for real? Is the feature set stable?
> > >
> > > My understanding is that jena-access is classed as stable (we're using
> > > it for something else already in production) and thus, since this
> > > merely produces a SecurityContext with a larger set of graphs, would
> > > theoretically be no less stable.
> > >
> > > >
> > > >
> > > > Opinion: it is not unreasonable to provide support for this kind of
> > > > customization of Fuseki.
> > > >
> > > > An extension can then provide whatever security is needed for the
> > > > situation and it is the Fuseki user/operator making the decisions about
> > > > what is acceptable security and what isn't.
> > > >
> > > > Fuseki has ways to add custom processors and this seems the way to
> > > > provide an alternative way to make queries.
> > > >
> > > > Putting it in the distribution codebase is a big step for the project.
> > > > At the very least, it needs to be mature and likely to be used.
> > >
> > > We wouldn't be reaching out if we weren't likely to want to use such a
> > > feature. All these concerns/questions/suggestions are exactly what we
> > > were hoping for. If I can provide any more context/tests/samples, let
> > > me know.
> > > (I completely get the concerns about diluting a known security feature
> > > and have no issue with something like this being a separate
> > > component.)
> > >
> > > >
> > > > Background: Currently jena-access is in Fuseki main. It is not optional
> > > > because it predates Fuseki modules.
> > > >
> > > >      Andy
> > >
> > >
> > >
> > > --
> > > Vilnis Termanis
> > > Technical Specialist
> > >
> > > e | vilnis.terma...@iotics.com
> > > www.iotics.com
> > >
>
>
>
> --
> Vilnis Termanis
> Technical Specialist
>
> e | vilnis.terma...@iotics.com
> www.iotics.com
>
> The information contained in this email is strictly confidential and
> intended only for the parties noted. If this email was not intended
> for your use, please contact Iotics. For more on our Privacy Policy
> please visit https://www.iotics.com/legal/

Re: About JENA-2339 - security related

Reply via email to