On 28/07/2022 20:50, Vilnis Termanis wrote:
Hi Andy & Jena development community,

(Answers inline - apologies if I repeat myself)

FYI - Our aim is to enable end-users to make SPARQL queries whilst
respecting visibility restrictions.
I.e. users (indirectly) add sets of related triples to a dataset and
they can choose who has visibility (beyond themselves) over these,
either: Nobody, Everyone or a chosen set (which can be updated). Note
that this restriction is not by a specific subject or predicate.
(Although the sets of triples do have relationships - not all of them
are known in advance.)

Let's clarify terminology here.

A "Jena user" is a person or organisation that is downloading Jena, either as the formal release (source code) or convenience binaries (e.g. jars from Maven Central). The "convenience binaries" is the more usual case.

Not Iotics users. Systems built with Jena have their own users.
(The Apache License applies - including clause 7.)

The responsibility is between the downstream system builder and their users of product or service being "fit for purpose".

using a "SELECT {} 1" query, and
adding a certain set of graphs makes the queries on my laptop take:
~600 graphs ~115ms
~1500 graphs ~162ms
~3k graphs ~240ms
~6k graphs ~400ms

That's an illustration of the current system but we don't know what is the cause of the cost.

What piece of the code is taking the time?
Maybe the right thing to do is make it faster.

And in the general area - what are you using for authentication?

There is some bearer auth support in the next release ... it does not provide complete bearer auth because it can't cover all cases (e.g. JWT validation). It is more of a framework template with which to build a local solution.

----

"FMod_ABAC" is not related to jena-permissions.

"FMod_" means Fuseki Module.
https://jena.apache.org/documentation/fuseki2/fuseki-modules
   No forks.
ABAC = Attribute Based Access Control.

Using attributes separates ACLs from direct naming users for access to things. FMod_ABAC things are triples. Triples have "labels". Labels are attribute expressions, including AND and OR operators.

    "employee | contractor" -- must have the "employee" attribute
                               or the "contractor" attribute.

    "employee & dept=engineering" -- must have both "employee" and
                                    "dept=engineering" attributes.

There is a division of responsibilities. The data is labelled - so the data owner is responsible for the data attribute requirements. The assignment of attributes to users is separate.

FYI - In our case this means that we have a "make SPARQL query" API
call. When received, the applicable user (our domain) is known and, in
the proposed PR, we can prepend the set of allowed graphs to the query
(which have been looked up prior to query execution, externally). The
end user has NO direct access to Fuseki itself.

You have a solution presuming a protected network, or possibly a container with in-container networking.

That's my Concern 1. Security conditions outside Jena must be met. Having that, even if not in use, is an issue.

Concern 1:

This by passes Fuseki-provided security and puts the control function
outside the Fuseki server in a separate server that is not part of Jena.
It will only be secure if deployed in a constrained network environment.

This is not secure except when run in a certain way and, personally, I
don't want to have to deal with a CVE because of that. CVE handling is
time consuming.

I don't see why it is using jena-access (the named graph security
feature) except for the filtering on TDB. It is creating a dynamic
dataset for the query.

You're right - it's only as secure as the middleware/proxy/whatever in
front of it which supplies the ACL. (This was never intended to be
used/exposed to end-users directly.)

Concern 2: How does update fit into the picture? (GSP is not supported).

I thought that, since GSP operations target a single graph, there is
no need to extend support to it since it's already possible to
restrict visibility (with the graph query parameter). Am I missing
something?

Having different ways to protect data across different operations is confusing. And quite easy to have unexpected problems which for security is bad.

Accessing the default graph when it is the union of the named graphs.


Concern 3: It looks like a specific solution for a specific scenario.
Will it get uptake by the wide Jena user community?

It's definitely specific. My thinking was that, if a subset of this
were deemed useful, then it'd be better to exist as part of the core
offering as opposed to us just bolting it on ourselves (at my job).
But, if that's not the case - fair enough.

What subsets do you have in mind?

    Andy

Reply via email to