Re: [VOTE] Apache Jena 5.1.0

2024-07-17 Thread Arne Bernhardt
[x] +1 Approve the release

Microsoft Windows [Version 10.0.19045.4529] + Eclipse
Adoptium\jdk-21.0.3.9-hotspot

[INFO] Reactor Summary for Apache Jena 5.1.0:
[INFO]
[INFO] Apache Jena  SUCCESS [
25.815 s]
[INFO] Apache Jena - IRI .. SUCCESS [
 7.456 s]
[INFO] Apache Jena - Base . SUCCESS [
 9.626 s]
[INFO] Apache Jena - Core . SUCCESS [01:03
min]
[INFO] Apache Jena - ARQ .. SUCCESS [01:04
min]
[INFO] Apache Jena - ONTAPI ... SUCCESS [
24.335 s]
[INFO] Apache Jena - SHACL  SUCCESS [
10.771 s]
[INFO] Apache Jena - ShEx . SUCCESS [
10.672 s]
[INFO] Apache Jena - RDF Patch  SUCCESS [
 4.400 s]
[INFO] Apache Jena - RDF Connection ... SUCCESS [
 6.725 s]
[INFO] Apache Jena - DBOE Database Operation Environment .. SUCCESS [
 0.209 s]
[INFO] Apache Jena - DBOE Base  SUCCESS [
 6.464 s]
[INFO] Apache Jena - DBOE Transactions  SUCCESS [
 5.469 s]
[INFO] Apache Jena - DBOE Indexes . SUCCESS [
 3.063 s]
[INFO] Apache Jena - DBOE Index test suite  SUCCESS [
 0.885 s]
[INFO] Apache Jena - DBOE Transactional Datastructures  SUCCESS [
31.546 s]
[INFO] Apache Jena - DBOE Storage . SUCCESS [
 4.855 s]
[INFO] Apache Jena - TDB1 (Native Triple Store) ... SUCCESS [
21.913 s]
[INFO] Apache Jena - TDB2 (Native Triple Store) ... SUCCESS [
14.953 s]
[INFO] Apache Jena - Libraries POM  SUCCESS [
 0.589 s]
[INFO] Apache Jena - Command line tools ... SUCCESS [
13.320 s]
[INFO] Apache Jena - SPARQL Text Search ... SUCCESS [
14.471 s]
[INFO] Apache Jena - Fuseki - A SPARQL 1.1 Server . SUCCESS [
 0.078 s]
[INFO] Apache Jena - Fuseki Core Engine ... SUCCESS [
12.622 s]
[INFO] Apache Jena - Fuseki UI  SUCCESS [
29.544 s]
[INFO] Apache Jena - Fuseki Data Access Control ... SUCCESS [
 8.541 s]
[INFO] Apache Jena - Fuseki Server Main ... SUCCESS [
37.715 s]
[INFO] Apache Jena - Fuseki Server Jar  SUCCESS [
 3.523 s]
[INFO] Apache Jena - Fuseki Webapp  SUCCESS [
11.306 s]
[INFO] Apache Jena - Fuseki WAR File .. SUCCESS [
 3.410 s]
[INFO] Apache Jena - Fuseki Server Standalone Jar . SUCCESS [
 4.537 s]
[INFO] Apache Jena - Fuseki Docker Tools .. SUCCESS [
 0.315 s]
[INFO] Apache Jena - Fuseki Binary Distribution ... SUCCESS [
 6.036 s]
[INFO] Apache Jena - GeoSPARQL Engine . SUCCESS [
12.617 s]
[INFO] Apache Jena - Fuseki with GeoSPARQL Engine . SUCCESS [
11.034 s]
[INFO] Apache Jena - Integration Testing .. SUCCESS [
10.160 s]
[INFO] Apache Jena - Benchmark Suite .. SUCCESS [
 0.059 s]
[INFO] Apache Jena - Benchmarks Shaded Jena 4.8.0 . SUCCESS [
 3.732 s]
[INFO] Apache Jena - Benchmarks JMH ... SUCCESS [
 3.180 s]
[INFO] Apache Jena - Distribution . SUCCESS [
 6.410 s]
[INFO] Apache Jena - Security Permissions . SUCCESS [
17.780 s]
[INFO] Apache Jena - Extras ... SUCCESS [
 0.060 s]
[INFO] Apache Jena - Extras - Query Builder ... SUCCESS [
10.373 s]
[INFO] Apache Jena - CommonsRDF for Jena .. SUCCESS [
 2.661 s]
[INFO] Apache Jena - Extras - Service Enhancer  SUCCESS [
 6.154 s]
[INFO] Apache Jena - Code Examples  SUCCESS [
 3.083 s]
[INFO] Apache Jena - BOM .. SUCCESS [
 0.056 s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time:  09:10 min
[INFO] Finished at: 2024-07-17T14:05:55+02:00
[INFO]



Am Fr., 12. Juli 2024 um 12:55 Uhr schrieb Andy Seaborne :

> Hi,
>
> Here is a vote on the first release candidate for Apache Jena version
> 5.1.0.
>
>  Release Vote
>
> This vote will be open until at least
>
>  Tuesday 16th July, 2024 at 08:00 UTC
>
> Please vote to approve this release:
>
>  [ ] +1 Approve the release
>  [ ]  0 Don't care
>  [ ] -1 Don't release, because ...
>
> Everyone, not just committers, is invited to test and vote.
> Please download and test the proposed release. See the checklist below.
>
> Staging repository:
>https://repository.apache.org/content/repositories/orgapachejena-1065
>
> Proposed dist/ area:
>https://dist.apache.org/repos/dist/dev/jena/
>
> 

Re: [] Apache Jena 5.1.0

2024-07-17 Thread Arne Bernhardt
Sorry for the delay. I will checkout, build and then vote within the next
hour.

   Arne

Andy Seaborne  schrieb am Mi., 17. Juli 2024, 13:15:

> Could we have another PMC vote please?
>
>  Andy
>
> On 12/07/2024 11:53, Andy Seaborne wrote:
> > Hi,
> >
> > Here is a vote on the first release candidate for Apache Jena version
> > 5.1.0.
> >
> >  Release Vote
> >
> > This vote will be open until at least
> >
> >  Tuesday 16th July, 2024 at 08:00 UTC
> >
> > Please vote to approve this release:
> >
> >  [ ] +1 Approve the release
> >  [ ]  0 Don't care
> >  [ ] -1 Don't release, because ...
> >
> > Everyone, not just committers, is invited to test and vote.
> > Please download and test the proposed release. See the checklist below.
> >
> > Staging repository:
> >https://repository.apache.org/content/repositories/orgapachejena-1065
> >
> > Proposed dist/ area:
> >https://dist.apache.org/repos/dist/dev/jena/
> >
> > Keys:
> >https://svn.apache.org/repos/asf/jena/dist/KEYS
> >
> > Git commit (browser URL):
> >https://github.com/apache/jena/commit/8cf1104383
> >
> > Git Commit Hash:
> >8cf11043838e312ab6ee82737de664e62d155cd1
> >
> > Git Commit Tag:
> >jena-5.1.0
> >
> > If you expect to check the release but the time limit does not work
> > for you, please email within the schedule above.
> >
> >  In this release
> >
> > Issues in this release:
> >
> >https://s.apache.org/jena-5.1.0-issues
> >
> > The major item for this release is the a new artifact jena-ontapi
> >
> > It has API support for working with OWL2 as well as other ontologies. It
> > is the long-term replacement for org.apache.jena.ontology.
> >
> >https://github.com/apache/jena/issues/2160
> >
> > This is a contribution from @sszuev
> >
> > == Also
> >
> > @karolina-telicent
> >Prefixes Service
> >  New endpoint for Fuseki to give read and read-write access to the
> >  prefixes of a dataset enabling lookup and modification over HTTP.
> >https://github.com/apache/jena/issues/2543
> >
> > Micrometer - Prometheus upgrade
> >See https://github.com/micrometer-metrics/micrometer/wiki/1.13-
> > Migration-Guide
> >https://github.com/apache/jena/pull/2480
> >
> > Value space of rdf:XMLLiteral changed to be RDF 1.1/1.2 value semantics.
> >Issue https://github.com/apache/jena/issues/2430
> >The value space in RDF 1.0 was different.
> >
> > @TelicentPaul - Paul Gallagher
> > Migrating Base 64 operations from Apache Commons Codec to Util package.
> >https://github.com/apache/jena/pull/2409
> >
> > Balduin Landolt @BalduinLandolt
> >javadoc fix for Literal.getString.
> >https://github.com/apache/jena/pull/2251
> >
> > ØyvindG @OyvindLGjesdal -
> >https://github.com/apache/jena/pull/2121
> > text index fix for
> >https://github.com/apache/jena/issues/2094
> >
> >   @wang3820 Tong Wang
> >Fix tests due to assumptions on hashmap order
> >https://github.com/apache/jena/pull/2098
> >
> > @thomasjtaylor Thomas J. Taylor
> >  Fix for NodeValueFloat
> >  https://github.com/apache/jena/pull/2374
> >
> > @Aklakan Claus Stadler
> > "Incorrect JoinClassifier results with unbound values."
> >https://github.com/apache/jena/issues/2412
> >
> > @Aklakan Claus Stadler
> >"QueryExec: abort before exec is ignored."
> >https://github.com/apache/jena/issues/2394
> >
> > @osi peter royal
> >Track rule engine instances
> >https://github.com/apache/jena/issues/2382
> >https://github.com/apache/jena/pull/2432
> >
> > Normalization/Canonicalization of values
> >Including RDFParserBuilder.canonicalValues
> >  This has been reworked to provide a consistent framework
> >  and also guarantee the same behavior between parsing
> >  and TDB2 handling of values.
> >https://github.com/apache/jena/issues/2557
> >
> > ---
> >
> > Checking:
> >
> > + are the GPG signatures fine?
> > + are the checksums correct?
> > + is there a source archive?
> > + can the source archive be built?
> >(NB This requires a "mvn install" first time)
> > + is there a correct LICENSE and NOTICE file in each artifact
> >(both source and binary artifacts)?
> > + does the NOTICE file contain all necessary attributions?
> > + have any licenses of dependencies changed due to upgrades?
> > if so have LICENSE and NOTICE been upgraded appropriately?
> > + does the tag/commit in the SCM contain reproducible sources?
>
>


Legal question: Is non-commercial licensing (like in CC BY-NC-SA) okay for test resources?

2024-06-09 Thread Arne Bernhardt
Hi,

the ENTSO-E published CIM/CGMES test data on CIM Conformity and
Interoperability
.

The Test Configurations v3.0.2

are
published under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0
International License .

Is it possible to use the test data in test resources and unit tests within
Apache Jena? Apache Jena itself should be open to commercial usage, but
could the tests be considered non-commercial?

Should I contact legal-disc...@apache.org?

There is also the option to contact ENTSO-E directly.

Note:
The RDFS and SHACL contained in Application Profiles v3.0.1

are
licensed under the Apache License Version 2.0, so the standard itself is
open.

- Arne


Re: [VOTE][LAZY] Accept jena-ontapi into the Apache Jena codebase

2024-05-03 Thread Arne Bernhardt
+1

Andy Seaborne  schrieb am Fr., 3. Mai 2024, 19:17:

> This is a lazy consensus VOTE to accept a PR for a new module
> jena-ontapi that enhances Apache Jena with OWL2 support.
>
> https://github.com/apache/jena/pull/2420
> from @sszuev
>
> This vote is open until
>
> 05:00am UTC Wednesday 8th May 2024
>
> The code in this PR is within a new jena-ontapi module except for two
> changes in the top Jena POM to add the module into the build.
>
> There is no change to org.apache.jena.ontology code in jena-core.
>
> There are no new dependencies for downstream user/application code.
>
> PR README.md (temporary link)
>
>
> https://github.com/apache/jena/blob/f1584c53c9834a38248dbfca1121214c8cdff4d8/jena-ontapi/README.md
>
> Previous discussion:
> https://lists.apache.org/thread/yr9q394fssr0mvgxvrskynmhjlz0g33x
>
>  Andy
>


Re: [RESULT] [VOTE] Apache Jena 5.0.0

2024-04-17 Thread Arne Bernhardt
Hi Chris,

after my question was answered, I voted ("[x] +1 Approve the release" on
17.03.2024, 01:12)
So I guess Andy's summary was correct::
" The vote passes with 3 PMCs members (Rob, Claude, Andy) and two
community votes from Arne and Marco. "

Arne

Am Mi., 17. Apr. 2024 um 11:08 Uhr schrieb Christofer Dutz :

> Well ... actually I don't count a vote by Arne, but a question. But as
> that's not a binding vote and there are enough of those, not a biggie ;-)
>
> Chris
>
> On 2024/03/20 08:17:32 Andy Seaborne wrote:
> >
> > The vote passes with 3 PMCs members (Rob, Claude, Andy) and two
> > community votes from Arne and Marco.
> >
> > On to pushing out the release ...
> >
> >  Andy
> >
> > On 16/03/2024 18:32, Andy Seaborne wrote:
> > > Hi,
> > >
> > > Here is a vote on the release of Apache Jena version 5.0.0.
> > >
> > >  Release Vote
> > >
> > > This vote will be open until at least
> > >
> > >  Wednesday 20th March 2024 at 08:00 UTC
> > >
> > > Please vote to approve this release:
> > >
> > >  [ ] +1 Approve the release
> > >  [ ]  0 Don't care
> > >  [ ] -1 Don't release, because ...
> >
>


Re: [] Apache Jena 5.0.0

2024-03-16 Thread Arne Bernhardt
[x] +1 Approve the release

Andy Seaborne  schrieb am Sa., 16. März 2024, 22:49:

>
>
> On 16/03/2024 19:50, Arne Bernhardt wrote:
> > Hi,
> >
> > it may be nothing but on my system there are a few "ERROR "s in the
> > console that I can't categorise (see attached log). The general result
> > is a successful build.
> > For example on line 4250:
> > "[ERROR] There are test failures.
> > Failed to run task: 'yarn run test:e2e' failed.
> > com.github.eirslett.maven.plugins.frontend.lib.TaskRunnerException:
> > 'yarn run test:e2e' failed. ..."
>
> Hi Arne - thanks for checking the release.
>
> This is from the jena-fuseki-ui.
>
> It looks like a failure to run the test framework, not a test failure.
> The e2e test framework is sensitive to the environment.
>
> "Process exited with an error: 1 (Exit value: 1)" isn't the most
> informative of error messages :-)
>
> It may be because (despite the yarn,node download) something in the
> toolchain is an old version.
>
> I have:
>node --version => v18.19.1
>yarn --version => 1.22.19
>npm --version => 10.2.4
>
> I checked the bots. The github action for MS Windows runs it fine; the
> Jenkins Windows job has the report you have.
>
> This maven module produces jena-fuseki-ui-5.0.0.jar and this isunpacked
> in jena-fuseki-webapp:pom.xml by maven-dependency-plugin. (a way to pass
> the built vue app through the build artifacts).
>
> The jena-fuseki-webapp build step succeeded so it looks like
> jena-fuseki-ui jar was produced.
>
> The build I did for the release was on Linux and the e2e:test passed in
> the release build and all the subsequent checking.
>
> So it runs the tests, and they pass, sometimes.
> I think we can continue and address the issue as part of regular
> development if that's OK.
>
>  Andy
>
> >
> > Arne
> >
> > Am Sa., 16. März 2024 um 19:34 Uhr schrieb Andy Seaborne
> > mailto:a...@apache.org>>:
> >
> > Hi,
> >
> > Here is a vote on the release of Apache Jena version 5.0.0.
> >
> >  Release Vote
> >
> > This vote will be open until at least
> >
> >   Wednesday 20th March 2024 at 08:00 UTC
> >
> > Please vote to approve this release:
> >
> >   [ ] +1 Approve the release
> >   [ ]  0 Don't care
> >   [ ] -1 Don't release, because ...
> >
> > Everyone, not just committers, is invited to test and vote.
> > Please download and test the proposed release. See the checklist
> below.
> >
> > Staging repository:
> >
> https://repository.apache.org/content/repositories/orgapachejena-1063 <
> https://repository.apache.org/content/repositories/orgapachejena-1063>
> >
> > Proposed dist/ area:
> > https://dist.apache.org/repos/dist/dev/jena/
> > <https://dist.apache.org/repos/dist/dev/jena/>
> >
> > Keys:
> > https://svn.apache.org/repos/asf/jena/dist/KEYS
> > <https://svn.apache.org/repos/asf/jena/dist/KEYS>
> >
> > Git commit (browser URL):
> > https://github.com/apache/jena/commit/f475cdc84a
> > <https://github.com/apache/jena/commit/f475cdc84a>
> >
> > Git Commit Hash:
> > f475cdc84a85e48c22a2c6487141e2d782c10517
> >
> > Git Commit Tag:
> > jena-5.0.0
> >
> > If you expect to check the release but the time limit does not work
> > for you, please email within the schedule above.
> >
> >   Andy
> >
> >
> >  About Jena5 
> >
> > == General
> >
> > Issues since Jena 4.10.0:
> >
> > https://s.apache.org/jena-5.0.0-issues
> > <https://s.apache.org/jena-5.0.0-issues>
> >
> > which includes the ones specifically related to Jena5:
> >
> > https://github.com/apache/jena/issues?q=label%3Ajena5
> > <https://github.com/apache/jena/issues?q=label%3Ajena5>
> >
> >
> > ** Java Requirement
> >
> > Java 17 or later is required.
> > Java 17 language constructs now are used in the codebase.
> >
> > ** Language tags
> >
> > Language tags become are case-insensitive unique.
> >
> > "abc"@EN and "abc"@en are the same RDF term.
> >
> > Internally, language tags are formatted using the algorithm of RFC
> 5646.
> >
> > Examples "@en", "@en-GB", "@en-Latn-

Re: [VOTE] Apache Jena 4.9.0 RC1

2023-07-07 Thread Arne Bernhardt
+1

Successfully tested in our "product" dev environment.

   Arne

Am Di., 4. Juli 2023 um 21:24 Uhr schrieb Andy Seaborne :

> Hi,
>
> Here is a vote on the release of Apache Jena 4.9.0.
> This is the first release candidate.
>
> The deadline is
>
>  Saturday, 8th July 2023 at 05:00 UTC
>
> Please vote to approve this release:
>
>  [ ] +1 Approve the release
>  [ ]  0 Don't care
>  [ ] -1 Don't release, because ...
>
>  Items in this release
>
> Arne Berdhardt
> https://github.com/apache/jena/issues/1912
> New implementations of in-memory graphs with better storage and
> performance.
>
> See the issue for performance details.
>
> See GraphMemFactory for access to these new graph implementations.
>
> Arne has also provided a performance analysis and improvements for the
> existing default in-memory graphs together with a benchmarking framework
>https://github.com/apache/jena/pull/1279
>
> --
>
> Switch from TriplyDB/(yasr,yasqe) to zazuko/(yasr,yasqe)
> to pick up fixes.
> Thank you Zazuko!
>
> --
>
> SERVICE on/off control
> https://github.com/apache/jena/pull/1906
>
> Provide the ability to switch off all SERVICE processing completely.
> Use
>Code: arq:httpServiceAllowed
>or http://jena.apache.org/ARQ#httpServiceAllowed=false
> to disable.
>
> e.g.
>fuseki-server --set arq:httpServiceAllowed=false 
>
> --
>
> Additional restrictions and control for SPARQL script functions
>https://github.com/apache/jena/pull/1908
>
> There is a new Jena context setting
>http://jena.apache.org/ARQ#scriptAllowList
> which is on the command line:
>arq:scriptAllowList
> and java constant
>ARQ.symCustomFunctionScriptAllowList
>
> Its value is a comma separated list of function names.
>"function1,function2"
> Only the functions in this can be called from SPARQL.
>
> As in Jena 4.8.0, the Java system property "jena:scripting" must also be
> set to "true" to enable script functions.
>Website (when published):
> https://jena.apache.org/documentation/query/javascript-functions
>
> --
>
> Prepare for Jena5:
>Deprecate  JSON-LD 1.0 constants
>Deprecate  API calls that may be removed.
>
> --
>
> Specific SPARQL 1.2 parser, tracking the RDF-star working group.
>All features are also available in the default SPARQL parser.
>
> --
> Ryan Shaw(@rybesh)
>new Turtle RDFFormat
>https://github.com/apache/jena/issues/1924
> --
> Simon Bin (@SimonBin)
>A fix for incorrect integer cast in scripting.NV
>https://github.com/apache/jena/pull/1851
> --
> Alexander Ilin-Tomich (@ailintom)
>Fix for SPARQL_Update verification and /HTTP PATCH
> --
> Ryan Shaw (@rybesh)
>Script fix for additional classpath elements
>https://github.com/apache/jena/pull/1877
> --
> FusekiModules:
> Issue: https://github.com/apache/jena/issues/1897
>
> There is a change in that the interface for automatically loading
> modules from the classpath has changed to FusekiAutoModule, The
> interface FusekiModule is now the configuration lifecycle only. This is
> to allow for programmatically set up a Fuskei server with Fuseki
> modules, including custom one from the calling application.
>
> ===
>  Release Vote
>
> Everyone, not just committers, is invited to test and vote.
> Please download and test the proposed release.
>
> Staging repository:
>https://repository.apache.org/content/repositories/orgapachejena-1059
>
> Proposed dist/ area:
>https://dist.apache.org/repos/dist/dev/jena/
>
> Keys:
>https://svn.apache.org/repos/asf/jena/dist/KEYS
>
> Git commit (browser URL):
>https://github.com/apache/jena/commit/84aa91e095
>
> Git Commit Hash:
>84aa91e095e20e0e3c7a55c9780f285ef8fb54bb
>
> Git Commit Tag:
>jena-4.9.0
>
> This vote will be open until at least
>
>Saturday, 8th July 2023 at 05:00 UTC
>
> If you expect to check the release but the time limit does not work
> for you, please email within the schedule above.
>
>  Thanks,
>  Andy
>
> Checking:
>
> + are the GPG signatures fine?
> + are the checksums correct?
> + is there a source archive?
> + can the source archive be built?
>(NB This requires a "mvn install" first time)
> + is there a correct LICENSE and NOTICE file in each artifact
>(both source and binary artifacts)?
> + does the NOTICE file contain all necessary attributions?
> + have any licenses of dependencies changed due to upgrades?
> if so have LICENSE and NOTICE been upgraded appropriately?
> + does the tag/commit in the SCM contain reproducible sources?
>


Re: Towards Jena 4.9.0

2023-06-23 Thread Arne Bernhardt
The switch to term-equality might break some code that uses the current
default implementation.
A switch in the GraphMemFactory in Jena 5.x to make it backwards compatible
seems to be a good option.
In this case, the general Jena codebase should remain compatible with the
literal value equality semantics.
As far as I know, org.apache.jena.graph.Capabilities#handlesLiteralTyping
should be used to control the behaviour here. My guess is, we might find
some places where it is not considered yet, because GraphMem has been the
default for so many years.

If there is not enough time to evaluate GraphMem2Fast over the summer, it
may be wise to start with GraphMem2Legacy as the default in Jena 5.x.
If the community sees a real advantage in GraphMem2Fast, we could make it
the new default in a later version.

   Arne

Am Fr., 23. Juni 2023 um 13:08 Uhr schrieb Andy Seaborne :

>
>
> On 22/06/2023 21:08, Arne Bernhardt wrote:
> > Do you think it would be possible to integrate
> > https://github.com/apache/jena/issues/1912 in Jena  4.9.0 ?
> > So there would be enough time and feedback to see if it can replace
> > GraphMem as default in Jena 5.0.0?
> >
> >   Arne
>
> Yes.
>
> A switch to term-semantics by default in graph/model is a 5.x thing but
> the code can be available. Feedback would be good but we can't rely on
> that; everyone is time-short.
>
> So would this be extra calls in ModelFactory?
> Possibly with a single switch so that the default can be made into one
> of the new term graphs? These Models and Graphs get created implicit as
> well as by application calls to ModelFactory.
>
>  Andy
>
> Let's rename org.apache.jena.graph.Factory to
> org.apache.jena.graph.GraphMemFactory at 5.0.0
> It's annoying.
>
> https://github.com/apache/jena/issues/1919
> and PR 1920 to start the process.
>


Re: Towards Jena 4.9.0

2023-06-22 Thread Arne Bernhardt
Do you think it would be possible to integrate
https://github.com/apache/jena/issues/1912 in Jena  4.9.0 ?
So there would be enough time and feedback to see if it can replace
GraphMem as default in Jena 5.0.0?

 Arne


Am Do., 22. Juni 2023 um 12:55 Uhr schrieb Andy Seaborne :

> Jena 2.8.0 was 23/04/2023.
>And Java 21 LTS is September 19th.
>https://openjdk.org/projects/jdk/21/
>
> So it's a early for 4.9.0 but it fits in better to keep away from summer
> and vacations.
>
> At the moment:
>https://s.apache.org/jena-4.9.0-issues
>
> jena-4.9.0 is 18 issues closed in 2 months and 36 PRs
>
>  Andy
>
> ---
>
> Specific SPARQL 1.2 parser, tracking the RDF-star working group.
>All features are also available in the default SPARQL parser.
>
> Arne Berdhardt has provided a performance analysis and
>improvements for the default in-memory graphs together
>with a benchmarking framework
>https://github.com/apache/jena/pull/1279
> https://github.com/apache/jena/pull/1279
>
> FusekiModules:
> Issue: https://github.com/apache/jena/issues/1897
>
> There is a change in that the interface for automatically loading
> modules from the classpath has changed to FusekiAutoModule, The
> interface FusekiModule is now the configuration lifecycle only. This is
> to allow for programmatically set up a Fuskei server with Fuseki
> modules, including custom one from the calling application.
>
> Simon Bin (@SimonBin)
> A fix for incorrect integer cast in scripting.NV
> https://github.com/apache/jena/pull/1851
>
> Alexander Ilin-Tomich (@ailintom)
> Fix for SPARQL_Update verification and /HTTP PATCH
>
> Issue: https://github.com/apache/jena/issues/1873
> Command line parser riot
> Warn on arguments that allow quads but output triples
>And error/warn if quads encountered
> Add argument --merge to project quads to triples.
>
> Ryan Shaw (@rybesh)
> Script fix for additional classpath elements
> https://github.com/apache/jena/pull/1877
>
> SERVICE on/off control
> https://github.com/apache/jena/pull/1906
>
> Provide the ability to switch off all SERVICE processing completely.
> Use
>arq:httpServiceAllowed
>http://jena.apache.org/ARQ#httpServiceAllowed=false
> to disable.
>
> e.g.
>fuseki-server --set arq:httpServiceAllowed=false 
>
> Additional restrictions and control for SPARQL script functions
> https://github.com/apache/jena/pull/1908
>
> There is a new Jena context setting
>http://jena.apache.org/ARQ#scriptAllowList
> which is on the command line:
>arq:scriptAllowList
> and java constant
>ARQ.symCustomFunctionScriptAllowList
>
> Its value is a comma separated list of function names.
>"function1,function2"
> Only the functions in this can be called from SPARQL.
>
> As in Jena 4.8.0, the Java system property "jena:scripting" must also be
> set to "true" to enable script functions.
>Website (when published):
> https://jena.apache.org/documentation/query/javascript-functions
>


Re: Why DatasetGraphInMemory?

2023-06-17 Thread Arne Bernhardt
Hi Andy,

in the meantime, I've been using my bulk update tests (e.g. 1 in 1M graph
updating about 200,000 triples over and over again) to observe if it slows
down and how memory is freed up when I call CG between iterations and
measure memory. RoaringBitmaps seem to be quite GC friendly. There was no
increase in memory usage. It even seems to be more GC friendly than
GraphMem in the same scenario.
So I decided to create https://github.com/apache/jena/issues/1912.

   Arne

Am Sa., 17. Juni 2023 um 17:33 Uhr schrieb Andy Seaborne :

>
>
> On 12/06/2023 21:36, Arne Bernhardt wrote:
> > Hi Andy
> >
> > you mentioned RoaringBitmaps. I took the time to experiment with them.
> > They are really amazing. The performance of #add, #remove and #contains
> is
> > comparable to Java HashSet. RoaringBitmaps are much faster at iterating
> > over values and they perform bit operations even between two quite large
> > bitmaps like a charm. RoaringBitmaps also need less memory than a
> > JavaHasSet. (even less than an optimized integer hash set based on the
> > concepts in HashCommon)
> > A first graph implementation was easy to create. (albeit with a little
> help
> > from ChatGPT, as I had no idea how to use RoaringBitmaps yet).
> > One only needs an indexed set of all triples and three maps indexed by
> > subject, predicate and object and bitmaps as values.
> > Each bitmap contains all indices of the triples with the corresponding
> node.
> > To find SPO --> use the set with all triples.
> > To find S__, _P_, or __O --> lookup the bitmap in the corresponding map
> and
> > iterate over all indices mapping to triples via the indexed set.
> > To find SP_, S_O, or _PO --> lookup the two bitmaps for both given nodes,
> > perform an "and" operation with both bitmaps and again iterate over the
> > resulting indices mapping to triples via the indexed set.
> > Especially the query of _PO is incredibly fast compared to GraphMem or
> > similarly structured graphs.
> > Just for fun, I replaced the bitmaps with two sets of integers and
> > simulated the "and" operation by iterating over the smallest set and
> > checking the entries in the larger set using #contains --> it is 10-100
> > times slower than the "and" operation of RoaringBitmaps.
> > Now I really understand the hype around RoaringBitmaps. It seems
> absolutely
> > justified to me.
> > Smaller graphs with RoaringBitmaps need about twice as much memory for
> the
> > indexing structures (triples excluded) as GraphMem.
> > (The additional memory requirement is not only due to the bitmaps, but
> also
> > to the additional indexed set of triples).
> > For larger graphs (> 500k and above), this gap begins to close. At 1M
> > triples, the variant with roaring bitmaps wins the advantage with 88MB
> > compared to 106MB with GraphMem.
> > After loading all the triples from bsbm-25m.nt.gz and two JVM warmup
> > iterations, it only took about 18 seconds to add them to the new graph,
> and
> > this graph only required an additional 1941 MB of memory.
> >
> > I'm not sure how RoaringBitmaps handles permanent updates. I have tried
> > many #add and #remove calls on larger graphs and they seem to work well.
> > But there are two methods that caught my attention:
> > *
> >
> https://javadoc.io/doc/org.roaringbitmap/RoaringBitmap/latest/org/roaringbitmap/RoaringBitmap.html#runOptimize()
> > *
> >
> https://javadoc.io/doc/org.roaringbitmap/RoaringBitmap/latest/org/roaringbitmap/RoaringBitmap.html#trim()
> > I have no idea when it would be a good time to use them.
> > Removing and adding triples from a graph of size x in y iterations and
> > measuring the impact on memory and performance could be one way to find
> > potential problems.
> > Do you have a scenario in mind that I could use to test if I ever need
> one
> > of these methods?
>
> Just from reading the javadoc - #runOptimize() might be useful for a
> load-and-readonly graph - do a lot of loading work and switch to the
> more efficient. It depends no how much space it saves. My instinct is
> that the saving for the overall graph may not be that great because the
> RDF terms take up a log of the space at scale so savings on the the
> bitmaps might, overall, not be significant.
>
> >
> > Arne
> >
> > Andy Seaborne  schrieb am Mo., 22. Mai 2023, 16:52:
> >
> >>
> >>
> >> On 20/05/2023 17:18, Arne Bernhardt wrote:
> >>> Hi Andy,
> >>> thank you, that was very helpful to get the whole picture.
> >>>
> >>>

Re: Why DatasetGraphInMemory?

2023-06-12 Thread Arne Bernhardt
Hi Andy

you mentioned RoaringBitmaps. I took the time to experiment with them.
They are really amazing. The performance of #add, #remove and #contains is
comparable to Java HashSet. RoaringBitmaps are much faster at iterating
over values and they perform bit operations even between two quite large
bitmaps like a charm. RoaringBitmaps also need less memory than a
JavaHasSet. (even less than an optimized integer hash set based on the
concepts in HashCommon)
A first graph implementation was easy to create. (albeit with a little help
from ChatGPT, as I had no idea how to use RoaringBitmaps yet).
One only needs an indexed set of all triples and three maps indexed by
subject, predicate and object and bitmaps as values.
Each bitmap contains all indices of the triples with the corresponding node.
To find SPO --> use the set with all triples.
To find S__, _P_, or __O --> lookup the bitmap in the corresponding map and
iterate over all indices mapping to triples via the indexed set.
To find SP_, S_O, or _PO --> lookup the two bitmaps for both given nodes,
perform an "and" operation with both bitmaps and again iterate over the
resulting indices mapping to triples via the indexed set.
Especially the query of _PO is incredibly fast compared to GraphMem or
similarly structured graphs.
Just for fun, I replaced the bitmaps with two sets of integers and
simulated the "and" operation by iterating over the smallest set and
checking the entries in the larger set using #contains --> it is 10-100
times slower than the "and" operation of RoaringBitmaps.
Now I really understand the hype around RoaringBitmaps. It seems absolutely
justified to me.
Smaller graphs with RoaringBitmaps need about twice as much memory for the
indexing structures (triples excluded) as GraphMem.
(The additional memory requirement is not only due to the bitmaps, but also
to the additional indexed set of triples).
For larger graphs (> 500k and above), this gap begins to close. At 1M
triples, the variant with roaring bitmaps wins the advantage with 88MB
compared to 106MB with GraphMem.
After loading all the triples from bsbm-25m.nt.gz and two JVM warmup
iterations, it only took about 18 seconds to add them to the new graph, and
this graph only required an additional 1941 MB of memory.

I'm not sure how RoaringBitmaps handles permanent updates. I have tried
many #add and #remove calls on larger graphs and they seem to work well.
But there are two methods that caught my attention:
*
https://javadoc.io/doc/org.roaringbitmap/RoaringBitmap/latest/org/roaringbitmap/RoaringBitmap.html#runOptimize()
*
https://javadoc.io/doc/org.roaringbitmap/RoaringBitmap/latest/org/roaringbitmap/RoaringBitmap.html#trim()
I have no idea when it would be a good time to use them.
Removing and adding triples from a graph of size x in y iterations and
measuring the impact on memory and performance could be one way to find
potential problems.
Do you have a scenario in mind that I could use to test if I ever need one
of these methods?

   Arne

Andy Seaborne  schrieb am Mo., 22. Mai 2023, 16:52:

>
>
> On 20/05/2023 17:18, Arne Bernhardt wrote:
> > Hi Andy,
> > thank you, that was very helpful to get the whole picture.
> >
> > Some time ago, I told you that at my workplace we implemented an
> in-memory
> > SPARQL-Server based on a Delta
> > <
> https://jena.apache.org/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/graph/compose/Delta.html
> >
> > .
> > We started a few years ago, before RDF-patch
> > <https://jena.apache.org/documentation/rdf-patch/>, based on the
> "difference
> > model"
> > <https://lists.w3.org/Archives/Public/www-rdf-interest/2001Mar/0216.html
> >,
> > that has become part of the CGMES standard.
> > For our server, we strictly follow the CQRS with event-sourcing
> > <https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs>
> > pattern. All transactions are recorded as an event with a list of triples
> > added and a list of triples removed.
> > The events are stored in an RDBMS (Oracle or PostgreSQL). For query
> > execution we need the relevant data to fit into memory but all data and
> > versions are also persisted.
> > To be able to store and load graphs very fast, we use RDF Thrift with LZ4
> > compression and store them in blobs.
> > All queries are executed on projected datasets for the requested version
> > (any previous version) of the data and the requested named graphs.
> > Thanks to the versioning, we fully support MR+SW. We even support
> multiple
> > writers, with a git-like branching and merging approach and optimistic
> > locking.
>
> How does that work for RDF?
>
> Is at the unit of an "entity"
>
> > To prevent the chain of deltas from

Re: Why DatasetGraphInMemory?

2023-05-24 Thread Arne Bernhardt
Am Mo., 22. Mai 2023 um 16:52 Uhr schrieb Andy Seaborne :

>
>
> On 20/05/2023 17:18, Arne Bernhardt wrote:
> > Hi Andy,
> > thank you, that was very helpful to get the whole picture.
> >
> > Some time ago, I told you that at my workplace we implemented an
> in-memory
> > SPARQL-Server based on a Delta
> > <
> https://jena.apache.org/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/graph/compose/Delta.html
> >
> > .
> > We started a few years ago, before RDF-patch
> > <https://jena.apache.org/documentation/rdf-patch/>, based on the
> "difference
> > model"
> > <https://lists.w3.org/Archives/Public/www-rdf-interest/2001Mar/0216.html
> >,
> > that has become part of the CGMES standard.
> > For our server, we strictly follow the CQRS with event-sourcing
> > <https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs>
> > pattern. All transactions are recorded as an event with a list of triples
> > added and a list of triples removed.
> > The events are stored in an RDBMS (Oracle or PostgreSQL). For query
> > execution we need the relevant data to fit into memory but all data and
> > versions are also persisted.
> > To be able to store and load graphs very fast, we use RDF Thrift with LZ4
> > compression and store them in blobs.
> > All queries are executed on projected datasets for the requested version
> > (any previous version) of the data and the requested named graphs.
> > Thanks to the versioning, we fully support MR+SW. We even support
> multiple
> > writers, with a git-like branching and merging approach and optimistic
> > locking.
>
> How does that work for RDF?


It is more like a rebase+merge: we merge the changes in the branch into
sets of removed and added triples per graph. The branch we want to merge
into is checked to see if each triple to be removed exists and each triple
to be added does not exist. If this condition is violated for any triple,
there were concurrent changes and we do not allow the merge.


> Is at the unit of an "entity"
>

The core part is only about graphs and triples. For entities we have
another layer based on RDFS.


> > To prevent the chain of deltas from becoming too long, we have several
> > strategies to write full graphs into snapshots.
> >
> > DatasetGraphInMemory seems to be 5-7 times slower than GraphMem, at least
> > when adding triples. So it might be worth trying to meet your constraints
> > while leaving the poor performance behind.
> > My basic idea is:
> > - having two instances "writer graph" and "reader graph" of GraphMem and
> > switching between them
> > - Unfortunately that would mean a doubling of memory. (but
> > DatasetGraphInMemory seems to use approx. 3 times the memory of GraphMem)
>
> There may be better persistent datastructure than Dexx, have you looked
> around? Seems a good iodea to adopt an existing library because get
> robustness and concurrency safety is not trivial.
>
>
https://github.com/romix/java-concurrent-hash-trie-map is an implementation
of a Ctrie <https://en.wikipedia.org/wiki/Ctrie> which could be compared to
Dexx.
It could be interesting to do some benchmarks.
One of the most common use-cases seems to be fast iterations over the
triples, mostly filtered by some criteria.
(at least for fast SPARQL queries, that seems to be the key performance
indicator)
For any graph up to the size of bsbm-5m.nt.gz I could not find a faster
solution than the approach of  HashCommon in GraphMem.
Any tree structure "breaks" fast iterations and even the Java HashMap, is
clearly slower at iterating over the values, due to the linked lists inside.
Now that I applied the optimized iterators to GraphMem, none of my
alternative variants beats GraphMem clearly in a sufficient number of
disciplines.
Only for bigger graphs, which is irrelevant for my use-cases, indexing more
than one level seems to have advantages.


> > - implement a dataset that performs the write transactions on the "writer
> > graph" and in deltas
>
> Jena now has BufferingDatasetGraph and BufferingGraph.
>
> These capture changes while leaving the underlying dataset or graph
> untouched until a flush is done.
>
> Only one copy and the diffs.
>
> If you have two copies you can atomically swap to get MR+SW - bufgferign
> flush is by default a write-txn on the underlying store.
>
> Two copies in memory isn't so bad if the strings (lexical forms, URIs)
> are shared.
>
> A parser run does this with a LRU cache - but it isn't guaranteeing
> uniquely intern'ed strings. (It save about 30% on long parser runs)
>
> You coul

Re: Why DatasetGraphInMemory?

2023-05-22 Thread Arne Bernhardt
Hi Rob,
thanks for sharing your perspective.

The in-memory graphs are exactly what I work with most and where I might
have the skills to contribute to the project.
My idea might actually only be relevant to the --mem use case.

I have no idea about the inner workings of TBD, as we (at work) needed a
temporary database where all changes are versioned.

Arne



Am Mo., 22. Mai 2023 um 11:07 Uhr schrieb Rob @ DNR :

> Fuseki is effectively the Jena projects database server that allows
> sharing a single Jena Dataset amongst many processes and users.
>
> This means that users expect database server like behaviour, i.e.,
> transactions, read isolation which the transactional in-memory dataset
> provides, when running Fuseki in the in-memory mode.
>
> I’m not sure about the full context of that comment but I don’t think
> that’s entirely true.  It depends on how the user starts and runs Fuseki.
> Most people who want a persistent dataset would be using TDB which has its
> own completely independent Dataset implementation, query executor and
> persistent data structures.
>
> Broadly speaking users of Fuseki run it in 3 main ways:
>
>
>   *   With TDB (the --loc=/path/to/db flag)
>   *   In Memory (the --mem flag)
>   *   With a configuration file (--config flag)
>
> For 1 DatasetGraphInMemory doesn’t get used AFAIK, the TDB specific
> implementations are used instead.  For 2 it’s the default dataset.  For 3
> it will depend on what the user has placed in their configuration file and
> might be a mixture of 1 and 2 plus inference, ancillary index wrappers
> (text/geospatial indexing) etc.
>
> Again, I think you’re getting hung up on the wrong thing here. An improved
> in-memory Graph implementation will have benefits, but it won’t necessarily
> be for all use cases.  There’s plenty of use cases where you do just want
> to briefly load/generate a bunch of RDF in-memory, manipulate it and move
> on, which an improved in-memory implementation will greatly benefit.
>
> Fuseki, as a database server, likely won’t benefit (except perhaps in some
> peoples custom configuration setups).  However, people who want performance
> with Fuseki should already be using TDB anyway.
>
> Hope this helps,
>
> Rob
>
> From: Arne Bernhardt 
> Date: Friday, 19 May 2023 at 21:21
> To: dev@jena.apache.org 
> Subject: Why DatasetGraphInMemory?
> Hi,
> in a recent  response
> <https://github.com/apache/jena/issues/1867#issuecomment-1546931793> to an
> issue it was said that   "Fuseki - uses DatasetGraphInMemory mostly"  .
> For my  PR <https://github.com/apache/jena/pull/1865>, I added a JMH
> benchmark suite to the project. So it was easy for me to compare the
> performance of GraphMem with
> "DatasetGraphFactory.createTxnMem().getDefaultGraph()".
> DatasetGraphInMemory is much slower in every discipline tested (#add,
> #delete, #contains, #find, #stream).
> Maybe my approach is too naive?
> I understand very well that the underlying Dexx Collections Framework, with
> its immutable persistent data structures, makes threading and transaction
> handling easy and that there are no issues with consuming iterators or
> streams even after a read transaction has closed.
> Is it currently supported for consumers to use iterators and streams after
> a transaction has been closed? If so, I don't currently see an easy way to
> replace DatasetGraphInMemory with a faster implementation. (although
> transaction-aware iterators that copy the remaining elements into lists
> could be an option).
> Are there other reasons why DatasetGraphInMemory is the preferred dataset
> implementation for Fuseki?
>
> Cheers,
> Arne
>


Re: Why DatasetGraphInMemory?

2023-05-20 Thread Arne Bernhardt
Hi Andy,
thank you, that was very helpful to get the whole picture.

Some time ago, I told you that at my workplace we implemented an in-memory
SPARQL-Server based on a Delta
<https://jena.apache.org/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/graph/compose/Delta.html>
.
We started a few years ago, before RDF-patch
<https://jena.apache.org/documentation/rdf-patch/>, based on the "difference
model"
<https://lists.w3.org/Archives/Public/www-rdf-interest/2001Mar/0216.html>,
that has become part of the CGMES standard.
For our server, we strictly follow the CQRS with event-sourcing
<https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs>
pattern. All transactions are recorded as an event with a list of triples
added and a list of triples removed.
The events are stored in an RDBMS (Oracle or PostgreSQL). For query
execution we need the relevant data to fit into memory but all data and
versions are also persisted.
To be able to store and load graphs very fast, we use RDF Thrift with LZ4
compression and store them in blobs.
All queries are executed on projected datasets for the requested version
(any previous version) of the data and the requested named graphs.
Thanks to the versioning, we fully support MR+SW. We even support multiple
writers, with a git-like branching and merging approach and optimistic
locking.
To prevent the chain of deltas from becoming too long, we have several
strategies to write full graphs into snapshots.

DatasetGraphInMemory seems to be 5-7 times slower than GraphMem, at least
when adding triples. So it might be worth trying to meet your constraints
while leaving the poor performance behind.
My basic idea is:
- having two instances "writer graph" and "reader graph" of GraphMem and
switching between them
   - Unfortunately that would mean a doubling of memory. (but
DatasetGraphInMemory seems to use approx. 3 times the memory of GraphMem)
- implement a dataset that performs the write transactions on the "writer
graph" and in deltas
  - meanwhile any number of readers could read from the previous version by
reading the "reader graph"
  - when the writer transaction finishes
 - if there are no readers using the "reader graph", it can be swapped
with the "writer graph" and the former "reader graph" can be updated using
the deltas
- when the next writer starts, it would again write into the "writer graph"
  - meanwhile any number of readers could read from previous version(s) by
reading "reader graph" [+ deltas]
- when the least reader of  the oldest version closes the transaction
- if there are no writers using the "writer graph", it can be reversed
using deltas in reverse order. Then "reader graph" can be swapped with the
"writer graph" for the next read transaction. Then the former "reader
graph" needs to be updated using all deltas.
- only if there is no point in time without overlapping read and write
transactions, a short pause may be needed occasionally to clear the deltas
that are piling up.
- It is not lock-free
- there would be no background tasks or scheduling involved, "only" having
each graph twice and all #add and #remove operations would have to be done
3-4 times.

The idea is still a bit blurry in my head. What do you think?

 Arne

Am Sa., 20. Mai 2023 um 15:19 Uhr schrieb Andy Seaborne :

> Hi Arne,
>
> On 19/05/2023 21:21, Arne Bernhardt wrote:
> > Hi,
> > in a recent  response
> > <https://github.com/apache/jena/issues/1867#issuecomment-1546931793> to
> an
> > issue it was said that   "Fuseki - uses DatasetGraphInMemory mostly"  .
> > For my  PR <https://github.com/apache/jena/pull/1865>, I added a JMH
> > benchmark suite to the project. So it was easy for me to compare the
> > performance of GraphMem with
> > "DatasetGraphFactory.createTxnMem().getDefaultGraph()".
> > DatasetGraphInMemory is much slower in every discipline tested (#add,
> > #delete, #contains, #find, #stream).
> > Maybe my approach is too naive?
> > I understand very well that the underlying Dexx Collections Framework,
> with
> > its immutable persistent data structures, makes threading and transaction
> > handling easy
>
> DatasetGraphInMemory (TIM = Transactions In Memory) has one big advantage.
>
> It supports multiple-readers and a single-writer (MR+SW) at the same
> time - truly concurrent. So does TDB2 (TDB1 is sort of hybrid).
>
> MR+SW has a cost which is a copy-on-write overhead, a reader-centric
> design choice allowing the readers to run latch-free.
>
> You can't directly use a regular hash map with concurrent updates. (And
> no, ConcurrentHashMap does not solve all problems, even for a single
> datastructure

Why DatasetGraphInMemory?

2023-05-19 Thread Arne Bernhardt
Hi,
in a recent  response
 to an
issue it was said that   "Fuseki - uses DatasetGraphInMemory mostly"  .
For my  PR , I added a JMH
benchmark suite to the project. So it was easy for me to compare the
performance of GraphMem with
"DatasetGraphFactory.createTxnMem().getDefaultGraph()".
DatasetGraphInMemory is much slower in every discipline tested (#add,
#delete, #contains, #find, #stream).
Maybe my approach is too naive?
I understand very well that the underlying Dexx Collections Framework, with
its immutable persistent data structures, makes threading and transaction
handling easy and that there are no issues with consuming iterators or
streams even after a read transaction has closed.
Is it currently supported for consumers to use iterators and streams after
a transaction has been closed? If so, I don't currently see an easy way to
replace DatasetGraphInMemory with a faster implementation. (although
transaction-aware iterators that copy the remaining elements into lists
could be an option).
Are there other reasons why DatasetGraphInMemory is the preferred dataset
implementation for Fuseki?

Cheers,
Arne


Re: Resolving against bad URI - parsing CIM RDF/XML reference data for CGMES with Jena 4.8.SNAPSHOT

2023-03-04 Thread Arne Bernhardt
Hello Andy,
thank you very much for your quick and detailed help!
"urn:uid:abc" seems to work for my benchmarks. This allows me to work with
typed literals from the real world for my contribution.
Graphs containing typed literals may have a different distribution of hash
values and different equality implementations will be used.

The information and references given will probably be useful at work when
we update jena. The "base" might play a bigger role there.

Regards
Arne


Am Sa., 4. März 2023 um 17:37 Uhr schrieb Andy Seaborne :

> Hi Arne,
>
> Thanks for testing 4.8.0-SNAPSHOT.
>
> Part of #1773 is to change to the same IRI handling used elsewhere in
> Jena. While still based in jena-iri, the IRIx layer has a specific set
> of scheme specific rules. Pure jena-iri is not up-to-date with all the RFCs
>
> The RDF/XMLfile itself is fine. The issue is the base URI in the parser
> setup.
>
> The URN scheme urn:uuid: defines the rests of the URI to match the
> syntax of a UUID: 671940cc-e6b5-47ad-9992-2d9185f53464
>
> RFC 8141 defines URNs as urn:NID:NSS -- it tightened up on URN syntax to
> require at least two characters in the middle part (NID) and one in the
> final part (NSS). It also permitted fragments, which were in the first
> URN RFC.
>
>
> So  --
>
> * is legal by URI syntax,
> * not correct the details a URN (must have 2 colons)
> * not correct by the detail of the urn:uuid namespace. RFC 4122.
>
> If you use a legal base, the file parses OK.
> Is that possible for you?
>
> urn:uid:abc
> http://example.org/
>
> (UID isn't registered -- and also Jena only has schema specific rules
> for certain URI and URN registrations.
>
> Andy
>
> https://www.rfc-editor.org/rfc/rfc8141.html
> https://www.rfc-editor.org/rfc/rfc4122.html
>
> PS There will be a transition legacy route to get to the 4.7.0 parser
> but that is temporary.
>
> On 03/03/2023 21:47, Arne Bernhardt wrote:
> > Hello,
> > the following code, which works fine under Jena 4.6, no longer works
> under
> > Jena 4.8.SNAPSHOT:
> >
> > RDFParser.create()
> >  .source(graphUri)
> >  .base("urn:uuid")
> >  .lang(Lang.RDFXML)
> >  .parse(streamSink);
> >
> > The graph looks like this:
> > 
> > http://iec.ch/TC57/CIM100#; xmlns:md="
> > http://iec.ch/TC57/61970-552/ModelDescription/1#; xmlns:rdf="
> > http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:eu="
> > http://iec.ch/TC57/CIM100-European#;>
> >
> >  1555284823 LoadArea
> > 
> >  5b5b515b-91bb-41c6-ba63-71a711139a86
> > 
> >
> >
> >  1055343234 SubLoadArea
> > 
> >   > "#_5b5b515b-91bb-41c6-ba63-71a711139a86" />
> >  27f108dd-e578-4921-8d3a-753e67bd718e
> > 
> >
> > 
> >
> > The error is: "org.apache.jena.riot.RiotException: [line: 3, col: 64]
> > {E214} Resolving against bad URI :
> > <#_5b5b515b-91bb-41c6-ba63-71a711139a86>"
> >
> > The example is an extract from the CGMES Conformity Assessment Scheme v3
> -
> > Test Configurations (
> > https://www.entsoe.eu/data/cim/cim-conformity-and-interoperability/ ->
> >
> https://www.entsoe.eu/Documents/CIM_documents/Grid_Model_CIM/ENTSO-E_Test_Configurations_v3.0.2.zip
> > ).
> >
> > Could my problem be related to the changes in
> > https://github.com/apache/jena/issues/1773?
> > Are my options or my base URI wrong?
> > Or if the format is wrong, what specification does it violate? (I haven't
> > figured out this URI/IRI thing yet, maybe I haven't found the right
> sources
> > for it).
> > How do I get Jena to accept the file, preferably as is?
> >
> > Greetings
> > Arne
> >
>


Resolving against bad URI - parsing CIM RDF/XML reference data for CGMES with Jena 4.8.SNAPSHOT

2023-03-03 Thread Arne Bernhardt
Hello,
the following code, which works fine under Jena 4.6, no longer works under
Jena 4.8.SNAPSHOT:

RDFParser.create()
.source(graphUri)
.base("urn:uuid")
.lang(Lang.RDFXML)
.parse(streamSink);

The graph looks like this:

http://iec.ch/TC57/CIM100#; xmlns:md="
http://iec.ch/TC57/61970-552/ModelDescription/1#; xmlns:rdf="
http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:eu="
http://iec.ch/TC57/CIM100-European#;>
  
1555284823 LoadArea

5b5b515b-91bb-41c6-ba63-71a711139a86

  
  
1055343234 SubLoadArea


27f108dd-e578-4921-8d3a-753e67bd718e

  


The error is: "org.apache.jena.riot.RiotException: [line: 3, col: 64]
{E214} Resolving against bad URI :
<#_5b5b515b-91bb-41c6-ba63-71a711139a86>"

The example is an extract from the CGMES Conformity Assessment Scheme v3 -
Test Configurations (
https://www.entsoe.eu/data/cim/cim-conformity-and-interoperability/ ->
https://www.entsoe.eu/Documents/CIM_documents/Grid_Model_CIM/ENTSO-E_Test_Configurations_v3.0.2.zip
).

Could my problem be related to the changes in
https://github.com/apache/jena/issues/1773?
Are my options or my base URI wrong?
Or if the format is wrong, what specification does it violate? (I haven't
figured out this URI/IRI thing yet, maybe I haven't found the right sources
for it).
How do I get Jena to accept the file, preferably as is?

Greetings
Arne