“Sets of triples” — aren’t these datasets? Couldn’t this use case be addressed by maintaining per-user datasets? Not sure if Fuseki can create datasets on the fly, but this seems like a much simpler feature to implement compared to a whole new ACL mechanism.
On Thu, 28 Jul 2022 at 22.51, Vilnis Termanis <vilnis.terma...@iotics.com.invalid> wrote: > Hi Andy & Jena development community, > > (Answers inline - apologies if I repeat myself) > > FYI - Our aim is to enable end-users to make SPARQL queries whilst > respecting visibility restrictions. > I.e. users (indirectly) add sets of related triples to a dataset and > they can choose who has visibility (beyond themselves) over these, > either: Nobody, Everyone or a chosen set (which can be updated). Note > that this restriction is not by a specific subject or predicate. > (Although the sets of triples do have relationships - not all of them > are known in advance.) > > On Thu, 28 Jul 2022 at 10:43, Andy Seaborne <a...@apache.org> wrote: > > > > JENA-2339 > > PR#1441 > > > https://github.com/vtermanis/jena/blob/dynamic-graph-restriction-extension/MOVE_ME_DynamicACL_notes.md > > > > tl;dr: > > > > It is a different role for Fuseki. > > > > Fuseki execute the security but the setup and control is from a trusted > > external server on the request execution path. > > > > It assumes certain deployment environments to be safe. > > FYI - In our case this means that we have a "make SPARQL query" API > call. When received, the applicable user (our domain) is known and, in > the proposed PR, we can prepend the set of allowed graphs to the query > (which have been looked up prior to query execution, externally). The > end user has NO direct access to Fuseki itself. > > > > > My feeling is that we should make Fuseki configurable enough so that a > > downstream 3rd party can add their security solution that is suitable > > for their environment. But we should not incorporate a particular > > security solution that relies on the deployment environment. > > > > ---- > > > > I've asked for more information about the claim on a performance > > motivator and some other background information. > > > > The usage patterns are not yet clear. The data is described as "a one > > graph per handful of subjects and their properties" and "100s of > > graphs". What the queries are is unstated. > > Right now, each graph has in the range of 300-500 triples (though the > amount depends on how much additional/domain-specific metadata > end-users choose to add) and the scale of deployed Fuseki datasets > range from having a few to ~6k graphs. > Since we'd like to allow end-users to run **any** queries they wish > (we enforce query timeouts), it's difficult to give concrete examples. > I can however say that TDB unionDefaultGraph mode is enabled (i.e. > most end-users won't choose to explicitly target a specific graph) and > that one of our representative "search" queries (which combines > GeoSPARQL + multiple explicit property matching across multiple > different subjects in a UNION + subsequent collection of mandatory & > optional fields) is between 20-40% faster than the current custom > solution. > (Note that we have also tried query re-writing to insert FROM/FROM > NAMED clauses - and that is very slow in comparison, presumably to the > higher level filtering involved, unlike the quad filter herein.) > > > > > There is no characterisation of the queries being made. If we are > > talking about overheads, the cases of a few big queries and many small > > queries are different. > > (pasted from JENA-2339 ticket) - using a "SELECT {} 1" query, and > adding a certain set of graphs makes the queries on my laptop take: > ~600 graphs ~115ms > ~1500 graphs ~162ms > ~3k graphs ~240ms > ~6k graphs ~400ms > > > > > The scale looks small (less than a million triples of triples - > > approximating as 100 graphs * 1000 triples). That makes the point about > > access to TDB hooks a bit redundant. > > The dataset I've tested this with has ~1.8M triples. That's not to say > this is the scale we're hoping to satisfy - that's the just what I > tested with first. By redundant, do you mean an alternative approach > should be used for this scale? > > > > > > > There is are distinguished users. A request from one of these users > > causes the set of visible graphs to be read from a comment at the start > > of the query text in the request. > > > > The use of large numbers of small named graphs to manage security > > settings looks to me like triple-level security. I have already > > mentioned work "FMod_ABAC": (£job related) awhile back (2/Jan/2022). It > > is triple level attribute-based security. > > It could well be that I'm seeing the wrong solution for the feature > we're trying to support (that's the other reason for reaching out to > the community. The reason (rightly or wrongly) to model this as a set > of graphs is: Each set of triples to be restricted are related, but > span multiple subjects and could also relate to other subjects in > other sets (as well as externally). > Hence I couldn't see how e.g. Jena Permissions could be applied here: > When you're provided with a single triple to check - you would have to > understand what type subject it is and how it relates to the "top > level" subject to which the ACL applies. Bundling everything into a > graph seemed like viable option. > > > > > Concern 1: > > > > This by passes Fuseki-provided security and puts the control function > > outside the Fuseki server in a separate server that is not part of Jena. > > It will only be secure if deployed in a constrained network environment. > > > > This is not secure except when run in a certain way and, personally, I > > don't want to have to deal with a CVE because of that. CVE handling is > > time consuming. > > > > I don't see why it is using jena-access (the named graph security > > feature) except for the filtering on TDB. It is creating a dynamic > > dataset for the query. > > You're right - it's only as secure as the middleware/proxy/whatever in > front of it which supplies the ACL. (This was never intended to be > used/exposed to end-users directly.) > The purpose of extending jena-access (instead of immediately writing > it as a separate module) was to illustrate with minimal code changes > (+ extension of existing tests) what it could look like, for > discussion. (The quad filtering / performance aspect would be the > same, regardless of location, I presume.) > > > > > Concern 2: How does update fit into the picture? (GSP is not supported). > > I thought that, since GSP operations target a single graph, there is > no need to extend support to it since it's already possible to > restrict visibility (with the graph query parameter). Am I missing > something? > > > > > Concern 3: It looks like a specific solution for a specific scenario. > > Will it get uptake by the wide Jena user community? > > It's definitely specific. My thinking was that, if a subset of this > were deemed useful, then it'd be better to exist as part of the core > offering as opposed to us just bolting it on ourselves (at my job). > But, if that's not the case - fair enough. > > > > > Concern 4: Is there long-term support and maintenance for the feature? > > (e.g. 5y+) > > How do we respond to users@ message about it? Is it experimental code or > > has it been used for real? Is the feature set stable? > > My understanding is that jena-access is classed as stable (we're using > it for something else already in production) and thus, since this > merely produces a SecurityContext with a larger set of graphs, would > theoretically be no less stable. > > > > > > > Opinion: it is not unreasonable to provide support for this kind of > > customization of Fuseki. > > > > An extension can then provide whatever security is needed for the > > situation and it is the Fuseki user/operator making the decisions about > > what is acceptable security and what isn't. > > > > Fuseki has ways to add custom processors and this seems the way to > > provide an alternative way to make queries. > > > > Putting it in the distribution codebase is a big step for the project. > > At the very least, it needs to be mature and likely to be used. > > We wouldn't be reaching out if we weren't likely to want to use such a > feature. All these concerns/questions/suggestions are exactly what we > were hoping for. If I can provide any more context/tests/samples, let > me know. > (I completely get the concerns about diluting a known security feature > and have no issue with something like this being a separate > component.) > > > > > Background: Currently jena-access is in Fuseki main. It is not optional > > because it predates Fuseki modules. > > > > Andy > > > > -- > Vilnis Termanis > Technical Specialist > > e | vilnis.terma...@iotics.com > www.iotics.com >