On Mon, 8 Aug 2022 at 17.21, Vilnis Termanis <vilnis.terma...@iotics.com.invalid> wrote:
> On Sat, 30 Jul 2022 at 21:14, Martynas Jusevičius > <marty...@atomgraph.com> wrote: > > > > On Fri, Jul 29, 2022 at 7:27 PM Vilnis Termanis > > <vilnis.terma...@iotics.com.invalid> wrote: > > > > > > (inline) > > > > > > On Fri, 29 Jul 2022 at 07:56, Martynas Jusevičius > > > <marty...@atomgraph.com> wrote: > > > > > > > > “Sets of triples” — aren’t these datasets? > > > > > > > > Couldn’t this use case be addressed by maintaining per-user > datasets? Not > > > > sure if Fuseki can create datasets on the fly, but this seems like a > much > > > > simpler feature to implement compared to a whole new ACL mechanism. > > > > > > The idea is, that if you had these "sets of triples" A-Z, one user > > > might be allowed to see A-M and another C-Q. With per-user datasets > > > you'd have to duplicate data to achieve that. And, when the ACL > > > changes, you'd have to copy/move triples from one dataset to another. > > > (Or am I missing a nuance to your proposal? Do you mean dynamically > > > creating a new dataset which references graphs from another dataset?) > > > > No, not missing :) > > > > I mean it sounds like a useful feature, and we could probably find use > > for it ourselves. > > > > But if the ACL is graph-scoped, can't it employ an existing ontology > > such as WAC? [1] > > It would be eating your own dogfood, and of course it being RDF you > > could query and update your ACL using SPARQL.That would probably > > require a meta-dataset containing ACL data for each secured dataset. > > It definitely could (and in fact, we are doing pretty much what you > describe right now). > However, thinking in general terms - aren't there two levels to such > an ACL solution: > > 1) ACL is treated as completely external to Jena/Fuseki: Something > else is responsible for providing the "allow list" of graphs. (And: > Ideally there is no hard requirement to require a Java integration to > use the feature.) This option has the advantage of not being specific to Fuseki, meaning thay it can work with any triplestore. The access check is encapsulated as a SPARQL query and can be easily reused accross frameworks. > 2) ACL is enabled by storing rules in a specific graph in a Jena > dataset (and there I agree WAC seems very sensible - as you've > linked). > > I'm querying about (1) where Jena/Fuseki is not necessarily the centre > of the picture, but part of multiple components. > > > > > As it happens we have an authorization request filter for Jersey that > > checks WAC access using SPARQL: > > > https://github.com/AtomGraph/LinkedDataHub/blob/master/src/main/java/com/atomgraph/linkeddatahub/server/filter/request/AuthorizationFilter.java > > The SPARQL query: > > > https://github.com/AtomGraph/LinkedDataHub/blob/master/src/main/webapp/WEB-INF/web.xml#L25 > > > > [1] https://www.w3.org/wiki/WebAccessControl > > > > > > > > > > > > > On Thu, 28 Jul 2022 at 22.51, Vilnis Termanis > > > > <vilnis.terma...@iotics.com.invalid> wrote: > > > > > > > > > Hi Andy & Jena development community, > > > > > > > > > > (Answers inline - apologies if I repeat myself) > > > > > > > > > > FYI - Our aim is to enable end-users to make SPARQL queries whilst > > > > > respecting visibility restrictions. > > > > > I.e. users (indirectly) add sets of related triples to a dataset > and > > > > > they can choose who has visibility (beyond themselves) over these, > > > > > either: Nobody, Everyone or a chosen set (which can be updated). > Note > > > > > that this restriction is not by a specific subject or predicate. > > > > > (Although the sets of triples do have relationships - not all of > them > > > > > are known in advance.) > > > > > > > > > > On Thu, 28 Jul 2022 at 10:43, Andy Seaborne <a...@apache.org> > wrote: > > > > > > > > > > > > JENA-2339 > > > > > > PR#1441 > > > > > > > > > > > > https://github.com/vtermanis/jena/blob/dynamic-graph-restriction-extension/MOVE_ME_DynamicACL_notes.md > > > > > > > > > > > > tl;dr: > > > > > > > > > > > > It is a different role for Fuseki. > > > > > > > > > > > > Fuseki execute the security but the setup and control is from a > trusted > > > > > > external server on the request execution path. > > > > > > > > > > > > It assumes certain deployment environments to be safe. > > > > > > > > > > FYI - In our case this means that we have a "make SPARQL query" API > > > > > call. When received, the applicable user (our domain) is known > and, in > > > > > the proposed PR, we can prepend the set of allowed graphs to the > query > > > > > (which have been looked up prior to query execution, externally). > The > > > > > end user has NO direct access to Fuseki itself. > > > > > > > > > > > > > > > > > My feeling is that we should make Fuseki configurable enough so > that a > > > > > > downstream 3rd party can add their security solution that is > suitable > > > > > > for their environment. But we should not incorporate a particular > > > > > > security solution that relies on the deployment environment. > > > > > > > > > > > > ---- > > > > > > > > > > > > I've asked for more information about the claim on a performance > > > > > > motivator and some other background information. > > > > > > > > > > > > The usage patterns are not yet clear. The data is described as > "a one > > > > > > graph per handful of subjects and their properties" and "100s of > > > > > > graphs". What the queries are is unstated. > > > > > > > > > > Right now, each graph has in the range of 300-500 triples (though > the > > > > > amount depends on how much additional/domain-specific metadata > > > > > end-users choose to add) and the scale of deployed Fuseki datasets > > > > > range from having a few to ~6k graphs. > > > > > Since we'd like to allow end-users to run **any** queries they wish > > > > > (we enforce query timeouts), it's difficult to give concrete > examples. > > > > > I can however say that TDB unionDefaultGraph mode is enabled (i.e. > > > > > most end-users won't choose to explicitly target a specific graph) > and > > > > > that one of our representative "search" queries (which combines > > > > > GeoSPARQL + multiple explicit property matching across multiple > > > > > different subjects in a UNION + subsequent collection of mandatory > & > > > > > optional fields) is between 20-40% faster than the current custom > > > > > solution. > > > > > (Note that we have also tried query re-writing to insert FROM/FROM > > > > > NAMED clauses - and that is very slow in comparison, presumably to > the > > > > > higher level filtering involved, unlike the quad filter herein.) > > > > > > > > > > > > > > > > > There is no characterisation of the queries being made. If we are > > > > > > talking about overheads, the cases of a few big queries and many > small > > > > > > queries are different. > > > > > > > > > > (pasted from JENA-2339 ticket) - using a "SELECT {} 1" query, and > > > > > adding a certain set of graphs makes the queries on my laptop take: > > > > > ~600 graphs ~115ms > > > > > ~1500 graphs ~162ms > > > > > ~3k graphs ~240ms > > > > > ~6k graphs ~400ms > > > > > > > > > > > > > > > > > The scale looks small (less than a million triples of triples - > > > > > > approximating as 100 graphs * 1000 triples). That makes the > point about > > > > > > access to TDB hooks a bit redundant. > > > > > > > > > > The dataset I've tested this with has ~1.8M triples. That's not to > say > > > > > this is the scale we're hoping to satisfy - that's the just what I > > > > > tested with first. By redundant, do you mean an alternative > approach > > > > > should be used for this scale? > > > > > > > > > > > > > > > > > > > > > > > There is are distinguished users. A request from one of these > users > > > > > > causes the set of visible graphs to be read from a comment at > the start > > > > > > of the query text in the request. > > > > > > > > > > > > The use of large numbers of small named graphs to manage security > > > > > > settings looks to me like triple-level security. I have already > > > > > > mentioned work "FMod_ABAC": (£job related) awhile back > (2/Jan/2022). It > > > > > > is triple level attribute-based security. > > > > > > > > > > It could well be that I'm seeing the wrong solution for the feature > > > > > we're trying to support (that's the other reason for reaching out > to > > > > > the community. The reason (rightly or wrongly) to model this as a > set > > > > > of graphs is: Each set of triples to be restricted are related, but > > > > > span multiple subjects and could also relate to other subjects in > > > > > other sets (as well as externally). > > > > > Hence I couldn't see how e.g. Jena Permissions could be applied > here: > > > > > When you're provided with a single triple to check - you would > have to > > > > > understand what type subject it is and how it relates to the "top > > > > > level" subject to which the ACL applies. Bundling everything into a > > > > > graph seemed like viable option. > > > > > > > > > > > > > > > > > Concern 1: > > > > > > > > > > > > This by passes Fuseki-provided security and puts the control > function > > > > > > outside the Fuseki server in a separate server that is not part > of Jena. > > > > > > It will only be secure if deployed in a constrained network > environment. > > > > > > > > > > > > This is not secure except when run in a certain way and, > personally, I > > > > > > don't want to have to deal with a CVE because of that. CVE > handling is > > > > > > time consuming. > > > > > > > > > > > > I don't see why it is using jena-access (the named graph security > > > > > > feature) except for the filtering on TDB. It is creating a > dynamic > > > > > > dataset for the query. > > > > > > > > > > You're right - it's only as secure as the > middleware/proxy/whatever in > > > > > front of it which supplies the ACL. (This was never intended to be > > > > > used/exposed to end-users directly.) > > > > > The purpose of extending jena-access (instead of immediately > writing > > > > > it as a separate module) was to illustrate with minimal code > changes > > > > > (+ extension of existing tests) what it could look like, for > > > > > discussion. (The quad filtering / performance aspect would be the > > > > > same, regardless of location, I presume.) > > > > > > > > > > > > > > > > > Concern 2: How does update fit into the picture? (GSP is not > supported). > > > > > > > > > > I thought that, since GSP operations target a single graph, there > is > > > > > no need to extend support to it since it's already possible to > > > > > restrict visibility (with the graph query parameter). Am I missing > > > > > something? > > > > > > > > > > > > > > > > > Concern 3: It looks like a specific solution for a specific > scenario. > > > > > > Will it get uptake by the wide Jena user community? > > > > > > > > > > It's definitely specific. My thinking was that, if a subset of this > > > > > were deemed useful, then it'd be better to exist as part of the > core > > > > > offering as opposed to us just bolting it on ourselves (at my job). > > > > > But, if that's not the case - fair enough. > > > > > > > > > > > > > > > > > Concern 4: Is there long-term support and maintenance for the > feature? > > > > > > (e.g. 5y+) > > > > > > How do we respond to users@ message about it? Is it > experimental code or > > > > > > has it been used for real? Is the feature set stable? > > > > > > > > > > My understanding is that jena-access is classed as stable (we're > using > > > > > it for something else already in production) and thus, since this > > > > > merely produces a SecurityContext with a larger set of graphs, > would > > > > > theoretically be no less stable. > > > > > > > > > > > > > > > > > > > > > > > Opinion: it is not unreasonable to provide support for this kind > of > > > > > > customization of Fuseki. > > > > > > > > > > > > An extension can then provide whatever security is needed for the > > > > > > situation and it is the Fuseki user/operator making the > decisions about > > > > > > what is acceptable security and what isn't. > > > > > > > > > > > > Fuseki has ways to add custom processors and this seems the way > to > > > > > > provide an alternative way to make queries. > > > > > > > > > > > > Putting it in the distribution codebase is a big step for the > project. > > > > > > At the very least, it needs to be mature and likely to be used. > > > > > > > > > > We wouldn't be reaching out if we weren't likely to want to use > such a > > > > > feature. All these concerns/questions/suggestions are exactly what > we > > > > > were hoping for. If I can provide any more context/tests/samples, > let > > > > > me know. > > > > > (I completely get the concerns about diluting a known security > feature > > > > > and have no issue with something like this being a separate > > > > > component.) > > > > > > > > > > > > > > > > > Background: Currently jena-access is in Fuseki main. It is not > optional > > > > > > because it predates Fuseki modules. > > > > > > > > > > > > Andy > > > > > > > > > > > > > > > > > > > > -- > > > > > Vilnis Termanis > > > > > Technical Specialist > > > > > > > > > > e | vilnis.terma...@iotics.com > > > > > www.iotics.com > > > > > > > > > > > > > > > > > -- > > > Vilnis Termanis > > > Technical Specialist > > > > > > e | vilnis.terma...@iotics.com > > > www.iotics.com > > > > > > The information contained in this email is strictly confidential and > > > intended only for the parties noted. If this email was not intended > > > for your use, please contact Iotics. For more on our Privacy Policy > > > please visit https://www.iotics.com/legal/ > > > > -- > Vilnis Termanis > Technical Specialist > > e | vilnis.terma...@iotics.com > www.iotics.com > > The information contained in this email is strictly confidential and > intended only for the parties noted. If this email was not intended > for your use, please contact Iotics. For more on our Privacy Policy > please visit https://www.iotics.com/legal/ >