On Fri, Jul 29, 2022 at 7:27 PM Vilnis Termanis <vilnis.terma...@iotics.com.invalid> wrote: > > (inline) > > On Fri, 29 Jul 2022 at 07:56, Martynas Jusevičius > <marty...@atomgraph.com> wrote: > > > > “Sets of triples” — aren’t these datasets? > > > > Couldn’t this use case be addressed by maintaining per-user datasets? Not > > sure if Fuseki can create datasets on the fly, but this seems like a much > > simpler feature to implement compared to a whole new ACL mechanism. > > The idea is, that if you had these "sets of triples" A-Z, one user > might be allowed to see A-M and another C-Q. With per-user datasets > you'd have to duplicate data to achieve that. And, when the ACL > changes, you'd have to copy/move triples from one dataset to another. > (Or am I missing a nuance to your proposal? Do you mean dynamically > creating a new dataset which references graphs from another dataset?)
No, not missing :) I mean it sounds like a useful feature, and we could probably find use for it ourselves. But if the ACL is graph-scoped, can't it employ an existing ontology such as WAC? [1] It would be eating your own dogfood, and of course it being RDF you could query and update your ACL using SPARQL.That would probably require a meta-dataset containing ACL data for each secured dataset. As it happens we have an authorization request filter for Jersey that checks WAC access using SPARQL: https://github.com/AtomGraph/LinkedDataHub/blob/master/src/main/java/com/atomgraph/linkeddatahub/server/filter/request/AuthorizationFilter.java The SPARQL query: https://github.com/AtomGraph/LinkedDataHub/blob/master/src/main/webapp/WEB-INF/web.xml#L25 [1] https://www.w3.org/wiki/WebAccessControl > > > > > On Thu, 28 Jul 2022 at 22.51, Vilnis Termanis > > <vilnis.terma...@iotics.com.invalid> wrote: > > > > > Hi Andy & Jena development community, > > > > > > (Answers inline - apologies if I repeat myself) > > > > > > FYI - Our aim is to enable end-users to make SPARQL queries whilst > > > respecting visibility restrictions. > > > I.e. users (indirectly) add sets of related triples to a dataset and > > > they can choose who has visibility (beyond themselves) over these, > > > either: Nobody, Everyone or a chosen set (which can be updated). Note > > > that this restriction is not by a specific subject or predicate. > > > (Although the sets of triples do have relationships - not all of them > > > are known in advance.) > > > > > > On Thu, 28 Jul 2022 at 10:43, Andy Seaborne <a...@apache.org> wrote: > > > > > > > > JENA-2339 > > > > PR#1441 > > > > > > > https://github.com/vtermanis/jena/blob/dynamic-graph-restriction-extension/MOVE_ME_DynamicACL_notes.md > > > > > > > > tl;dr: > > > > > > > > It is a different role for Fuseki. > > > > > > > > Fuseki execute the security but the setup and control is from a trusted > > > > external server on the request execution path. > > > > > > > > It assumes certain deployment environments to be safe. > > > > > > FYI - In our case this means that we have a "make SPARQL query" API > > > call. When received, the applicable user (our domain) is known and, in > > > the proposed PR, we can prepend the set of allowed graphs to the query > > > (which have been looked up prior to query execution, externally). The > > > end user has NO direct access to Fuseki itself. > > > > > > > > > > > My feeling is that we should make Fuseki configurable enough so that a > > > > downstream 3rd party can add their security solution that is suitable > > > > for their environment. But we should not incorporate a particular > > > > security solution that relies on the deployment environment. > > > > > > > > ---- > > > > > > > > I've asked for more information about the claim on a performance > > > > motivator and some other background information. > > > > > > > > The usage patterns are not yet clear. The data is described as "a one > > > > graph per handful of subjects and their properties" and "100s of > > > > graphs". What the queries are is unstated. > > > > > > Right now, each graph has in the range of 300-500 triples (though the > > > amount depends on how much additional/domain-specific metadata > > > end-users choose to add) and the scale of deployed Fuseki datasets > > > range from having a few to ~6k graphs. > > > Since we'd like to allow end-users to run **any** queries they wish > > > (we enforce query timeouts), it's difficult to give concrete examples. > > > I can however say that TDB unionDefaultGraph mode is enabled (i.e. > > > most end-users won't choose to explicitly target a specific graph) and > > > that one of our representative "search" queries (which combines > > > GeoSPARQL + multiple explicit property matching across multiple > > > different subjects in a UNION + subsequent collection of mandatory & > > > optional fields) is between 20-40% faster than the current custom > > > solution. > > > (Note that we have also tried query re-writing to insert FROM/FROM > > > NAMED clauses - and that is very slow in comparison, presumably to the > > > higher level filtering involved, unlike the quad filter herein.) > > > > > > > > > > > There is no characterisation of the queries being made. If we are > > > > talking about overheads, the cases of a few big queries and many small > > > > queries are different. > > > > > > (pasted from JENA-2339 ticket) - using a "SELECT {} 1" query, and > > > adding a certain set of graphs makes the queries on my laptop take: > > > ~600 graphs ~115ms > > > ~1500 graphs ~162ms > > > ~3k graphs ~240ms > > > ~6k graphs ~400ms > > > > > > > > > > > The scale looks small (less than a million triples of triples - > > > > approximating as 100 graphs * 1000 triples). That makes the point about > > > > access to TDB hooks a bit redundant. > > > > > > The dataset I've tested this with has ~1.8M triples. That's not to say > > > this is the scale we're hoping to satisfy - that's the just what I > > > tested with first. By redundant, do you mean an alternative approach > > > should be used for this scale? > > > > > > > > > > > > > > > There is are distinguished users. A request from one of these users > > > > causes the set of visible graphs to be read from a comment at the start > > > > of the query text in the request. > > > > > > > > The use of large numbers of small named graphs to manage security > > > > settings looks to me like triple-level security. I have already > > > > mentioned work "FMod_ABAC": (£job related) awhile back (2/Jan/2022). It > > > > is triple level attribute-based security. > > > > > > It could well be that I'm seeing the wrong solution for the feature > > > we're trying to support (that's the other reason for reaching out to > > > the community. The reason (rightly or wrongly) to model this as a set > > > of graphs is: Each set of triples to be restricted are related, but > > > span multiple subjects and could also relate to other subjects in > > > other sets (as well as externally). > > > Hence I couldn't see how e.g. Jena Permissions could be applied here: > > > When you're provided with a single triple to check - you would have to > > > understand what type subject it is and how it relates to the "top > > > level" subject to which the ACL applies. Bundling everything into a > > > graph seemed like viable option. > > > > > > > > > > > Concern 1: > > > > > > > > This by passes Fuseki-provided security and puts the control function > > > > outside the Fuseki server in a separate server that is not part of Jena. > > > > It will only be secure if deployed in a constrained network environment. > > > > > > > > This is not secure except when run in a certain way and, personally, I > > > > don't want to have to deal with a CVE because of that. CVE handling is > > > > time consuming. > > > > > > > > I don't see why it is using jena-access (the named graph security > > > > feature) except for the filtering on TDB. It is creating a dynamic > > > > dataset for the query. > > > > > > You're right - it's only as secure as the middleware/proxy/whatever in > > > front of it which supplies the ACL. (This was never intended to be > > > used/exposed to end-users directly.) > > > The purpose of extending jena-access (instead of immediately writing > > > it as a separate module) was to illustrate with minimal code changes > > > (+ extension of existing tests) what it could look like, for > > > discussion. (The quad filtering / performance aspect would be the > > > same, regardless of location, I presume.) > > > > > > > > > > > Concern 2: How does update fit into the picture? (GSP is not supported). > > > > > > I thought that, since GSP operations target a single graph, there is > > > no need to extend support to it since it's already possible to > > > restrict visibility (with the graph query parameter). Am I missing > > > something? > > > > > > > > > > > Concern 3: It looks like a specific solution for a specific scenario. > > > > Will it get uptake by the wide Jena user community? > > > > > > It's definitely specific. My thinking was that, if a subset of this > > > were deemed useful, then it'd be better to exist as part of the core > > > offering as opposed to us just bolting it on ourselves (at my job). > > > But, if that's not the case - fair enough. > > > > > > > > > > > Concern 4: Is there long-term support and maintenance for the feature? > > > > (e.g. 5y+) > > > > How do we respond to users@ message about it? Is it experimental code or > > > > has it been used for real? Is the feature set stable? > > > > > > My understanding is that jena-access is classed as stable (we're using > > > it for something else already in production) and thus, since this > > > merely produces a SecurityContext with a larger set of graphs, would > > > theoretically be no less stable. > > > > > > > > > > > > > > > Opinion: it is not unreasonable to provide support for this kind of > > > > customization of Fuseki. > > > > > > > > An extension can then provide whatever security is needed for the > > > > situation and it is the Fuseki user/operator making the decisions about > > > > what is acceptable security and what isn't. > > > > > > > > Fuseki has ways to add custom processors and this seems the way to > > > > provide an alternative way to make queries. > > > > > > > > Putting it in the distribution codebase is a big step for the project. > > > > At the very least, it needs to be mature and likely to be used. > > > > > > We wouldn't be reaching out if we weren't likely to want to use such a > > > feature. All these concerns/questions/suggestions are exactly what we > > > were hoping for. If I can provide any more context/tests/samples, let > > > me know. > > > (I completely get the concerns about diluting a known security feature > > > and have no issue with something like this being a separate > > > component.) > > > > > > > > > > > Background: Currently jena-access is in Fuseki main. It is not optional > > > > because it predates Fuseki modules. > > > > > > > > Andy > > > > > > > > > > > > -- > > > Vilnis Termanis > > > Technical Specialist > > > > > > e | vilnis.terma...@iotics.com > > > www.iotics.com > > > > > > > -- > Vilnis Termanis > Technical Specialist > > e | vilnis.terma...@iotics.com > www.iotics.com > > The information contained in this email is strictly confidential and > intended only for the parties noted. If this email was not intended > for your use, please contact Iotics. For more on our Privacy Policy > please visit https://www.iotics.com/legal/