“Sets of triples” — aren’t these datasets?

Couldn’t this use case be addressed by maintaining per-user datasets? Not
sure if Fuseki can create datasets on the fly, but this seems like a much
simpler feature to implement compared to a whole new ACL mechanism.

On Thu, 28 Jul 2022 at 22.51, Vilnis Termanis
<vilnis.terma...@iotics.com.invalid> wrote:

> Hi Andy & Jena development community,
>
> (Answers inline - apologies if I repeat myself)
>
> FYI - Our aim is to enable end-users to make SPARQL queries whilst
> respecting visibility restrictions.
> I.e. users (indirectly) add sets of related triples to a dataset and
> they can choose who has visibility (beyond themselves) over these,
> either: Nobody, Everyone or a chosen set (which can be updated). Note
> that this restriction is not by a specific subject or predicate.
> (Although the sets of triples do have relationships - not all of them
> are known in advance.)
>
> On Thu, 28 Jul 2022 at 10:43, Andy Seaborne <a...@apache.org> wrote:
> >
> > JENA-2339
> > PR#1441
> >
> https://github.com/vtermanis/jena/blob/dynamic-graph-restriction-extension/MOVE_ME_DynamicACL_notes.md
> >
> > tl;dr:
> >
> > It is a different role for Fuseki.
> >
> > Fuseki execute the security but the setup and control is from a trusted
> > external server on the request execution path.
> >
> > It assumes certain deployment environments to be safe.
>
> FYI - In our case this means that we have a "make SPARQL query" API
> call. When received, the applicable user (our domain) is known and, in
> the proposed PR, we can prepend the set of allowed graphs to the query
> (which have been looked up prior to query execution, externally). The
> end user has NO direct access to Fuseki itself.
>
> >
> > My feeling is that we should make Fuseki configurable enough so that a
> > downstream 3rd party can add their security solution that is suitable
> > for their environment. But we should not incorporate a particular
> > security solution that relies on the deployment environment.
> >
> > ----
> >
> > I've asked for more information about the claim on a performance
> > motivator and some other background information.
> >
> > The usage patterns are not yet clear. The data is described as "a one
> > graph per handful of subjects and their properties" and "100s of
> > graphs". What the queries are is unstated.
>
> Right now, each graph has in the range of 300-500 triples (though the
> amount depends on how much additional/domain-specific metadata
> end-users choose to add) and the scale of deployed Fuseki datasets
> range from having a few to ~6k graphs.
> Since we'd like to allow end-users to run **any** queries they wish
> (we enforce query timeouts), it's difficult to give concrete examples.
> I can however say that TDB unionDefaultGraph mode is enabled (i.e.
> most end-users won't choose to explicitly target a specific graph) and
> that one of our representative "search" queries (which combines
> GeoSPARQL + multiple explicit property matching across multiple
> different subjects in a UNION + subsequent collection of mandatory &
> optional fields) is between 20-40% faster than the current custom
> solution.
> (Note that we have also tried query re-writing to insert FROM/FROM
> NAMED clauses - and that is very slow in comparison, presumably to the
> higher level filtering involved, unlike the quad filter herein.)
>
> >
> > There is no characterisation of the queries being made. If we are
> > talking about overheads, the cases of a few big queries and many small
> > queries are different.
>
> (pasted from JENA-2339 ticket) - using a "SELECT {} 1" query, and
> adding a certain set of graphs makes the queries on my laptop take:
> ~600 graphs ~115ms
> ~1500 graphs ~162ms
> ~3k graphs ~240ms
> ~6k graphs ~400ms
>
> >
> > The scale looks small (less than a million triples of triples -
> > approximating as 100 graphs * 1000 triples). That makes the point about
> > access to TDB hooks a bit redundant.
>
> The dataset I've tested this with has ~1.8M triples. That's not to say
> this is the scale we're hoping to satisfy - that's the just what I
> tested with first. By redundant, do you mean an alternative approach
> should be used for this scale?
>
> >
> >
> > There is are distinguished users. A request from one of these users
> > causes the set of visible graphs to be read from a comment at the start
> > of the query text in the request.
> >
> > The use of large numbers of small named graphs to manage security
> > settings looks to me like triple-level security.  I have already
> > mentioned work "FMod_ABAC": (£job related) awhile back (2/Jan/2022). It
> > is triple level attribute-based security.
>
> It could well be that I'm seeing the wrong solution for the feature
> we're trying to support (that's the other reason for reaching out to
> the community. The reason (rightly or wrongly) to model this as a set
> of graphs is: Each set of triples to be restricted are related, but
> span multiple subjects and could also relate to other subjects in
> other sets (as well as externally).
> Hence I couldn't see how e.g. Jena Permissions could be applied here:
> When you're provided with a single triple to check - you would have to
> understand what type subject it is and how it relates to the "top
> level" subject to which the ACL applies. Bundling everything into a
> graph seemed like viable option.
>
> >
> > Concern 1:
> >
> > This by passes Fuseki-provided security and puts the control function
> > outside the Fuseki server in a separate server that is not part of Jena.
> > It will only be secure if deployed in a constrained network environment.
> >
> > This is not secure except when run in a certain way and, personally, I
> > don't want to have to deal with a CVE because of that. CVE handling is
> > time consuming.
> >
> > I don't see why it is using jena-access (the named graph security
> > feature) except for the filtering on TDB. It is creating a dynamic
> > dataset for the query.
>
> You're right - it's only as secure as the middleware/proxy/whatever in
> front of it which supplies the ACL. (This was never intended to be
> used/exposed to end-users directly.)
> The purpose of extending jena-access (instead of immediately writing
> it as a separate module) was to illustrate with minimal code changes
> (+ extension of existing tests) what it could look like, for
> discussion. (The quad filtering / performance aspect would be the
> same, regardless of location, I presume.)
>
> >
> > Concern 2: How does update fit into the picture? (GSP is not supported).
>
> I thought that, since GSP operations target a single graph, there is
> no need to extend support to it since it's already possible to
> restrict visibility (with the graph query parameter). Am I missing
> something?
>
> >
> > Concern 3: It looks like a specific solution for a specific scenario.
> > Will it get uptake by the wide Jena user community?
>
> It's definitely specific. My thinking was that, if a subset of this
> were deemed useful, then it'd be better to exist as part of the core
> offering as opposed to us just bolting it on ourselves (at my job).
> But, if that's not the case - fair enough.
>
> >
> > Concern 4: Is there long-term support and maintenance for the feature?
> > (e.g. 5y+)
> > How do we respond to users@ message about it? Is it experimental code or
> > has it been used for real? Is the feature set stable?
>
> My understanding is that jena-access is classed as stable (we're using
> it for something else already in production) and thus, since this
> merely produces a SecurityContext with a larger set of graphs, would
> theoretically be no less stable.
>
> >
> >
> > Opinion: it is not unreasonable to provide support for this kind of
> > customization of Fuseki.
> >
> > An extension can then provide whatever security is needed for the
> > situation and it is the Fuseki user/operator making the decisions about
> > what is acceptable security and what isn't.
> >
> > Fuseki has ways to add custom processors and this seems the way to
> > provide an alternative way to make queries.
> >
> > Putting it in the distribution codebase is a big step for the project.
> > At the very least, it needs to be mature and likely to be used.
>
> We wouldn't be reaching out if we weren't likely to want to use such a
> feature. All these concerns/questions/suggestions are exactly what we
> were hoping for. If I can provide any more context/tests/samples, let
> me know.
> (I completely get the concerns about diluting a known security feature
> and have no issue with something like this being a separate
> component.)
>
> >
> > Background: Currently jena-access is in Fuseki main. It is not optional
> > because it predates Fuseki modules.
> >
> >      Andy
>
>
>
> --
> Vilnis Termanis
> Technical Specialist
>
> e | vilnis.terma...@iotics.com
> www.iotics.com
>

Reply via email to