[GitHub] jena pull request #314: JENA-1430
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/314#discussion_r153813903 --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java --- @@ -26,25 +30,29 @@ import org.apache.jena.atlas.logging.Log ; import org.apache.jena.query.Dataset ; import org.apache.jena.query.DatasetFactory ; -import org.apache.jena.rdf.model.Model ; -import org.apache.jena.rdf.model.RDFNode ; -import org.apache.jena.rdf.model.Resource ; +import org.apache.jena.rdf.model.*; import org.apache.jena.sparql.graph.GraphFactory ; import org.apache.jena.sparql.util.FmtUtils ; import org.apache.jena.sparql.util.graph.GraphUtils ; +import org.apache.jena.system.Txn; public class DatasetAssembler extends AssemblerBase implements Assembler { public static Resource getType() { return DatasetAssemblerVocab.tDataset ; } @Override -public Object open(Assembler a, Resource root, Mode mode) { +public Dataset open(Assembler a, Resource root, Mode mode) { Dataset ds = createDataset(a, root, mode) ; return ds ; } public Dataset createDataset(Assembler a, Resource root, Mode mode) { +checkType(root, DatasetAssemblerVocab.tDataset); +// use TIM if quads are loaded or if all named Graphs are loaded via data property +final boolean allNamedGraphsLoadViaData = multiValueResource(root, pNamedGraph).stream().allMatch(g -> g.hasProperty(data)); +if (root.hasProperty(data) || allNamedGraphsLoadViaData) return new InMemDatasetAssembler().open(a, root, mode); --- End diff -- Much clearer than otherwise, to my eye. This is style, and I'm happy to change this to be clearer for you, but it's not an objective question. ---
[GitHub] jena pull request #317: JENA-1440: TDB2 - transform bytes to NodeIds directl...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/317#discussion_r153284608 --- Diff: jena-db/jena-dboe-base/src/main/java/org/apache/jena/dboe/base/buffer/RecordBufferIteratorMapper.java --- @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.jena.dboe.base.buffer; + +import static org.apache.jena.atlas.lib.Alg.decodeIndex ; + +import java.util.Iterator; +import java.util.NoSuchElementException; + +import org.apache.jena.atlas.lib.Bytes; +import org.apache.jena.dboe.base.record.Record; +import org.apache.jena.dboe.base.record.RecordMapper; + +// Iterate over one RecordBuffer +public class RecordBufferIteratorMapper implements Iterator +{ +private RecordBuffer rBuff ; +private int nextIdx ; +private X slot = null ; +private final byte[] keySlot ; +private final Record maxRec ; +private final Record minRec ; +private final RecordMapper mapper; + +//RecordBufferIteratorMapper(RecordBuffer rBuff) +//{ this(rBuff, null, null); } + +RecordBufferIteratorMapper(RecordBuffer rBuff, Record minRecord, Record maxRecord, int keyLen, RecordMapper mapper) +{ +this.rBuff = rBuff ; +this.mapper = mapper ; +this.keySlot = (maxRecord==null) ? null : new byte[keyLen]; +nextIdx = 0 ; +minRec = minRecord ; +if ( minRec != null ) +{ +nextIdx = rBuff.find(minRec) ; +if ( nextIdx < 0 ) +nextIdx = decodeIndex(nextIdx) ; +} + +maxRec = maxRecord ; +} + +private void finish() +{ +rBuff = null ; +nextIdx = -99 ; --- End diff -- Might be nice to call this out as a constant, like `NO_NEXT_INDEX` or the like. ---
Re: Jena 3.6.0?
Right, I just wouldn't want to make 3.6.0 wait on it if the other stuff gets done. ajs6f > On Nov 27, 2017, at 9:51 AM, Andy Seaborne wrote: > > > > On 27/11/17 14:30, ajs6f wrote: >> Comments inline... >> ajs6f >>> On Nov 27, 2017, at 8:10 AM, Andy Seaborne wrote: >>> >>> ... >>> 1/ The jena-text documentation improvements >> Is this required for or by a release? Can we not do this independently? > > Required? No. > > It needs doing and the website gets updated on release. > >Andy
Re: Jena 3.6.0?
Comments inline... ajs6f > On Nov 27, 2017, at 8:10 AM, Andy Seaborne wrote: > > ... > 1/ The jena-text documentation improvements Is this required for or by a release? Can we not do this independently? > 2/ Downgrade shiro to 1.2.6 > 3/ riot: status code on warnings (#315) +1 to merging; I would ideally like to confirm the fix with Ian Dickinson before closing the ticket. > 4/ Ideally, dataset assembler (#314) [might be too tight for time]. Waiting on feedback from Andy (and anyone else who might be interested). > Anything else? 1391 is still hanging, but with a release this close I don't think I can write enough tests before then to feel comfortable sending a PR, so let's leave it be. > > Rob - I can merge #315 and we can sort out the implementation stuff later. > >Andy > > On 25/11/17 23:45, ajs6f wrote: >> Ditto, except for me it's the 8th. >> ajs6f >>> On Nov 25, 2017, at 6:12 PM, Bruno P. Kinoshita >>> wrote: >>> >>> I can run the build and verify signatures any day in the next weeks. Just >>> not much time to properly test Fuseki and review changes until after Dec >>> 3rd. >>> CheersBruno >>> >>> From: Andy Seaborne >>> To: "dev@jena.apache.org" >>> Sent: Sunday, 26 November 2017 12:02 PM >>> Subject: Jena 3.6.0? >>> >>> The bug in Fuseki that causes UI uploads to fail, and some other UI >>> issues, is a bit annoying. >>> >>> Is there the energy and time to vote on a 3.6.0 release if I build one? >>> Please respond if you'll be able to vote in the next few weeks. >>> >>> If there is - from our experience last time, we can test the latest >>> development builds now, before a formal VOTE which will shorten the time >>> in case there is any problems to address. >>> >>> Andy >>> >>> The build is complaining about a Shiro issue - this is harmless and a >>> problem somewhere in the Fuseki tests. Some state is getting initialized >>> twice. It does not happen when Fuseki is run nor does it cause any >>> tests to fail. It happens because of the 1.2.4->1.4.0 Shiro upgrade ; >>> it comes in at 1.2.6 -> 1.3.0. Solution: ship with 1.2.6 >>> >>> """ >>> [...] IniRealm WARN Users or Roles are already populated. Configured >>> Ini instance will be ignored. >>> """ >>> >>> Andy >>> >>> >>>
Re: Jena 3.6.0?
Ditto, except for me it's the 8th. ajs6f > On Nov 25, 2017, at 6:12 PM, Bruno P. Kinoshita > wrote: > > I can run the build and verify signatures any day in the next weeks. Just not > much time to properly test Fuseki and review changes until after Dec 3rd. > CheersBruno > > From: Andy Seaborne > To: "dev@jena.apache.org" > Sent: Sunday, 26 November 2017 12:02 PM > Subject: Jena 3.6.0? > > The bug in Fuseki that causes UI uploads to fail, and some other UI > issues, is a bit annoying. > > Is there the energy and time to vote on a 3.6.0 release if I build one? > Please respond if you'll be able to vote in the next few weeks. > > If there is - from our experience last time, we can test the latest > development builds now, before a formal VOTE which will shorten the time > in case there is any problems to address. > > Andy > > The build is complaining about a Shiro issue - this is harmless and a > problem somewhere in the Fuseki tests. Some state is getting initialized > twice. It does not happen when Fuseki is run nor does it cause any > tests to fail. It happens because of the 1.2.4->1.4.0 Shiro upgrade ; > it comes in at 1.2.6 -> 1.3.0. Solution: ship with 1.2.6 > > """ > [...] IniRealm WARN Users or Roles are already populated. Configured > Ini instance will be ignored. > """ > > Andy > > >
Re: CMS diff: DB2 - Use with Fuseki2
Committed, thanks! ajs6f > On Nov 25, 2017, at 2:40 PM, Laura wrote: > > Clone URL (Committers only): > https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Ftdb2%2Ftdb2_fuseki.md > > Laura > > Index: trunk/content/documentation/tdb2/tdb2_fuseki.md > === > --- trunk/content/documentation/tdb2/tdb2_fuseki.md (revision 1816255) > +++ trunk/content/documentation/tdb2/tdb2_fuseki.md (working copy) > @@ -17,7 +17,7 @@ > PREFIX fuseki: <http://jena.apache.org/fuseki#>; > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>; > PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>; > -PREFIX tdb2:<http://jena.apache.org/2016/tdb#>;; > +PREFIX tdb2:<http://jena.apache.org/2016/tdb#>; > PREFIX ja: <http://jena.hpl.hp.com/2005/11/Assembler#>; > > [] rdf:type fuseki:Server ; >
Re: mapping URIs
Claude, are you saying you want people to be able to query Fuseki using urn:foo:bar:yeehaw and get back answers using http://server:8080/yeehaw? Otherwise, I'm guessing I'm missing something, but why wouldn't you do the substitutions on the way from the backend to Fuseki? ajs6f > On Nov 22, 2017, at 12:13 PM, Claude Warren wrote: > > I have a case where data are generated in a backend system that is not > publicly accessible and has no idea where the data are going to be served > from. > > The backend system generates URNs like "" > > What I think I want to do is on the fuseki server be able to configure > "urn:foo:bar" as a place holder for "http://server:8080/yeehaw";. > > Now, I know I can add this as part of an OWL:sameValue but I would like to > see Fuseki do that. > > In this way when the data are hosted on another system the resolution can > be adjusted appropriately. > > Perhaps this does not make sense. Perhaps there is a way to do this > already. Perhaps this is a really bad idea. So I am throwing it out there > to see if there are any comments. > > Thx, > Claude > > -- > I like: Like Like - The likeliest place on the web > <http://like-like.xenei.com> > LinkedIn: http://www.linkedin.com/in/claudewarren
[GitHub] jena issue #314: JENA-1430
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/314 @afs What do you think of that? It's clearer, I think, along the lines [you suggested](https://github.com/apache/jena/pull/314#discussion_r152289270). ---
[GitHub] jena pull request #314: JENA-1430
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/314#discussion_r152397704 --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java --- @@ -58,27 +64,33 @@ public Dataset createDataset(Assembler a, Resource root, Mode mode) { // Assembler description did not define one. dftModel = GraphFactory.makeDefaultModel() ; Dataset ds = DatasetFactory.create(dftModel) ; -// Named graphs -List nodes = GraphUtils.multiValue(root, DatasetAssemblerVocab.pNamedGraph) ; -for ( RDFNode n : nodes ) { -if ( !(n instanceof Resource) ) -throw new DatasetAssemblerException(root, "Not a resource: " + FmtUtils.stringForRDFNode(n)); -Resource r = (Resource)n; - -String gName = GraphUtils.getAsStringValue(r, DatasetAssemblerVocab.pGraphName); -Resource g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraph); -if ( g == null ) { -g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraphAlt); -if ( g != null ) { -Log.warn(this, "Use of old vocabulary: use :graph not :graphData"); -} else { -throw new DatasetAssemblerException(root, "no graph for: " + gName); +Txn.executeWrite(ds, () -> { +// Load data into the default graph or quads into the dataset. +multiValueAsString(root, data) +.forEach(dataURI -> read(ds, dataURI)); + --- End diff -- @afs What's a good idiom for switching to a new assembler? In other words, let's say the code tests for the presence of quads and finds them and it's time to pivot to TIM. Obviously, I could just `new InMemDatasetAssembler()`, but I'm thinking there must be a more elegant way. I looked at `AssemblerUtils` but only saw ways to register `Assembler`s, not retrieve them⦠---
[GitHub] jena pull request #314: JENA-1430
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/314#discussion_r152311436 --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java --- @@ -58,27 +64,33 @@ public Dataset createDataset(Assembler a, Resource root, Mode mode) { // Assembler description did not define one. dftModel = GraphFactory.makeDefaultModel() ; Dataset ds = DatasetFactory.create(dftModel) ; -// Named graphs -List nodes = GraphUtils.multiValue(root, DatasetAssemblerVocab.pNamedGraph) ; -for ( RDFNode n : nodes ) { -if ( !(n instanceof Resource) ) -throw new DatasetAssemblerException(root, "Not a resource: " + FmtUtils.stringForRDFNode(n)); -Resource r = (Resource)n; - -String gName = GraphUtils.getAsStringValue(r, DatasetAssemblerVocab.pGraphName); -Resource g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraph); -if ( g == null ) { -g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraphAlt); -if ( g != null ) { -Log.warn(this, "Use of old vocabulary: use :graph not :graphData"); -} else { -throw new DatasetAssemblerException(root, "no graph for: " + gName); +Txn.executeWrite(ds, () -> { +// Load data into the default graph or quads into the dataset. +multiValueAsString(root, data) +.forEach(dataURI -> read(ds, dataURI)); + --- End diff -- Okay, if I get what you are saying, it's: 1. Check to see if quads are being loaded, if so, TIM. 2. Otherwise, check the named graphs. If they are all `ja:data` guys, then TIM again. 3. Otherwise, general dataset. I'll get on this later today. ---
[GitHub] jena pull request #314: JENA-1430
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/314#discussion_r152284128 --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java --- @@ -58,27 +64,33 @@ public Dataset createDataset(Assembler a, Resource root, Mode mode) { // Assembler description did not define one. dftModel = GraphFactory.makeDefaultModel() ; Dataset ds = DatasetFactory.create(dftModel) ; -// Named graphs -List nodes = GraphUtils.multiValue(root, DatasetAssemblerVocab.pNamedGraph) ; -for ( RDFNode n : nodes ) { -if ( !(n instanceof Resource) ) -throw new DatasetAssemblerException(root, "Not a resource: " + FmtUtils.stringForRDFNode(n)); -Resource r = (Resource)n; - -String gName = GraphUtils.getAsStringValue(r, DatasetAssemblerVocab.pGraphName); -Resource g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraph); -if ( g == null ) { -g = GraphUtils.getResourceValue(r, DatasetAssemblerVocab.pGraphAlt); -if ( g != null ) { -Log.warn(this, "Use of old vocabulary: use :graph not :graphData"); -} else { -throw new DatasetAssemblerException(root, "no graph for: " + gName); +Txn.executeWrite(ds, () -> { +// Load data into the default graph or quads into the dataset. +multiValueAsString(root, data) +.forEach(dataURI -> read(ds, dataURI)); + --- End diff -- I'm trying to think of a use case for "load quads into non-TIM" and the one that occurs to me is in an embedded or integrated situation where you have a lot of quads, like so many that you prefer the memory-parsimonious-ness of the general IM dataset, maybe because you have other processes running in the system. Sound likely enough to merit (2)? ---
[GitHub] jena pull request #314: JENA-1430
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/314#discussion_r152283404 --- Diff: jena-fuseki2/examples/fuseki-in-mem-txn.ttl --- @@ -0,0 +1,24 @@ +## Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0 + +@prefix :<#> . +@prefix fuseki: <http://jena.apache.org/fuseki#> . +@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . +@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#> . +@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . +@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> . + +<#serviceInMemory> rdf:type fuseki:Service; +rdfs:label "In-memory, trasnactioal dataset."; +fuseki:name "ds"; +fuseki:serviceQuery "query"; +fuseki:serviceQuery "sparql"; +fuseki:serviceUpdate "update"; +fuseki:serviceUpload "upload" ; +fuseki:serviceReadGraphStore "data" ; +fuseki:serviceReadGraphStore "get" ; +fuseki:dataset <#dataset> ; +. + +<#dataset> rdf:type ja:DatasetTxnMem; --- End diff -- Nice catch, thanks, fixed. ---
[GitHub] jena issue #306: Algorithms for JENA-1414
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/306 Okay, now I get it. Agreed that number 3 is "trying too hard" and on the proposal to provide number 2 and document appropriate usage. ---
[GitHub] jena pull request #306: Algorithms for JENA-1414
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/306#discussion_r152085996 --- Diff: jena-core/src/main/java/org/apache/jena/graph/GraphUtil.java --- @@ -246,43 +282,214 @@ private static void deleteIteratorWorkerDirect(Graph graph, Iterator it) } } -private static final int sliceSize = 1000 ; -/** A safe and cautious remove() function that converts the remove to - * a number of {@link Graph#delete(Triple)} operations. +private static int MIN_SRC_SIZE = 1000 ; +// If source and destination are large, limit the search for the best way round to "deleteFrom" +private static int MAX_SRC_SIZE = 1000*1000 ; +private static int DST_SRC_RATIO = 2 ; + +/** + * Delete triples in the destination (arg 1) as given in the source (arg 2). + * + * @implNote + * This is designed for the case of {@code dstGraph} being comparable or much larger than + * {@code srcGraph} or {@code srcGraph} having a lot of triples to actually be + * deleted from {@code dstGraph}. This includes large, persistent {@code dstGraph}. + * + * It is not designed for a large {@code srcGraph} and large {@code dstGraph} + * with only a few triples in common delete from {@code dstGraph}. It is better to + * calculate the difference in someway, and copy into a small graph to use as the {@srcGraph}. --- End diff -- typo: some way ---
[GitHub] jena pull request #314: JENA-1430
GitHub user ajs6f opened a pull request: https://github.com/apache/jena/pull/314 JENA-1430 Includes #313, plus: - Extend testing to `DatasetAssembler` - Ensure that `DatasetAssembler` can also load quads - Correct `ja:DatasetTxnMem` => `ja:MemoryDataset` You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajs6f/jena JENA-1430p Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/314.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #314 commit d174ec04dccb205de96e63c775e01f948380f8cc Author: Andy Seaborne Date: 2017-11-20T10:57:01Z JENA-1430: Read quads for ja:data by filename commit 3e13dc64f4047eb589d9da46e50561a25290a230 Author: ajs6f Date: 2017-11-20T18:47:42Z JENA-1430: Quad loading for in-memory assemblers ---
Re: CMS diff: Jena Full Text Search
I went to review this diff and rediscovered (to my chagrin) that I really know very little about Jena's text indexing. Osma (or anyone else who knows text indexing better than do I, which wouldn't take much)-- could you review this? It's got some great useful detail about how the indexing works and can be used. ajs6f > On Nov 20, 2017, at 1:51 AM, Chris Tomlinson wrote: > > Clone URL (Committers only): > https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Fquery%2Ftext-query.mdtext > > Chris Tomlinson > > Index: trunk/content/documentation/query/text-query.mdtext > === > --- trunk/content/documentation/query/text-query.mdtext (revision > 1815762) > +++ trunk/content/documentation/query/text-query.mdtext (working copy) > @@ -1,5 +1,7 @@ > Title: Jena Full Text Search > > +Title: Jena Full Text Search > + > This extension to ARQ combines SPARQL and full text search via > [Lucene](https://lucene.apache.org) 6.4.1 or > [ElasticSearch](https://www.elastic.co) 5.2.1 (which is built on > @@ -64,7 +66,20 @@ > ## Table of Contents > > - [Architecture](#architecture) > +- [External content](#external-content) > +- [External applications](#external-applications) > +- [Document structure](#document-structure) > - [Query with SPARQL](#query-with-sparql) > +- [Syntax](#syntax) > +- [Input arguments](#input-arguments) > +- [Output arguments](#output-arguments) > +- [Query strings](#query-strings) > +- [Simple queries](#simple-queries) > +- [Queries with language tags](#queries-with-language-tags) > +- [Queries that retrieve literals](#queries-that-retrieve-literals) > +- [Queries across multiple > `Field`s](#queries-across-multiple-fields) > +- [Queries within a `Field`](#queries-within-a-field) > +- [Good practice](#good-practice) > - [Configuration](#configuration) > - [Text Dataset Assembler](#text-dataset-assembler) > - [Configuring an analyzer](#configuring-an-analyzer) > @@ -134,6 +149,69 @@ > By using Elasticsearch, other applications can share the text index with > SPARQL search. > > +### Document structure > + > +As mentioned above, text indexing of a triple involves associating a Lucene > +document with the triple. How is this done? > + > +Lucene documents are composed of `Field`s. Indexing and searching are > performed > +over the contents of these `Field`s. For an RDF triple to be indexed in > Lucene the > +_property_ of the triple must be > +[configured in the entity map of a TextIndex](#entity-map-definition). > +This associates a Lucene analyzer with the _`property`_ which will be used > +for indexing and search. The _`property`_ becomes the _searchable_ Lucene > +`Field` in the resulting document. > + > +A Lucene index includes a _default_ `Field`, which is specified in the > configuration, > +that is the field to search if not otherwise named in the query. In > jena-text > +this field is configured via the `text:defaultField` property which is then > mapped > +to a specific RDF property via `text:predicate` (see [entity > map](#entity-map-definition) > +below). > + > +There are several additional `Field`s that will be included in the > +document that is passed to the Lucene `IndexWriter` depending on the > +configuration options that are used. These additional fields are used to > +manage the interface between Jena and Lucene and are not generally > +searchable per se. > + > +The most important of these additional `Field`s is the `text:entityField`. > +This configuration property defines the name of the `Field` that will contain > +the _URI_ or _blank node id_ of the _subject_ of the triple being indexed. > This property does > +not have a default and must be specified for most uses of `jena-text`. This > +`Field` is often given the name, `uri`, in examples. It is via this `Field` > +that `?s` is bound in a typical use such as: > + > +select ?s > +where { > +?s text:query "some text" > +} > + > +Other `Field`s that may be configured: `text:uidField`, `text:graphField`, > +and so on are discussed below. > + > +Given the triple: > + > +ex:SomeOne skos:prefLabel "zorn protégé a prés"@fr ; > + > +The following illustrates a Lucene document that Jena will create and > +request Lucene to index: > + > +Document< > +stored, indexed, indexOptions=DOCS http://example.org/SomeOne> > +indexed, omitNorms, indexOptions=DOCS > > +stored, in
Re: gitpubsub
Bruno (or anyone), do you know if it would be possible to publish site changes for review out of Apache CI? (Something like the way we can set up to get built artifacts from branches of the codebase without actually releasing them.) Is it okay with respect to Apache policy to only import the current state of the site to Git (iow to leave behind that massive accumulation of Javadocs), or do we need to maintain a complete history on whatever infrastructure we use? ajs6f > On Nov 17, 2017, at 3:30 AM, Bruno P. Kinoshita > wrote: > >> What changes if we go for gitpubsub? > > > Not much for end users. For developers, we would need to get used to > whichever tool we choose for static site generator. > > >> If I read that right, no CMS because CMS is svnpubsub only. Is it a "big >> bang" switch to Jekyll? That isn't too scary but it is a step-change. > > Not much I think. Most of the Markdown can be easily ported with some > regex/shell script. When I helped porting OpenNLP's site, I used Jena website > as reference for parts of their new layout and general organization. If you > open both sites opennlp.apache.org and jena.apache.org, you may find they are > both very similar. > > And we don't have to necessarily use Jekyll. If the consensus is for another > tool (e.g. Pelican, Hexo, JBake, etc) we just need to confirm with Apache > Infra if they are able to run the same tool in their automation pipeline. > > >> One thing we do benefit from currently is content fixes via CMS - we may >> have to change that. I guess there is no jena.staging.a.o? It becomes local >> Jekyll build? > > As far as I know, that is right. However, users can run something like > `jekyll serve`. I like the current process, but if you have a great change, > it is hard to get feedback without committing to SVN, having some draft in > the staging area. > > With the gitpubsub + some static site generator. Or we can even share our own > GitHub fork website. OpenNLP template has an issue with extra paths, so this > is broken, but we can work to have Jena website working correctly, and send a > pull request to opennlp's repo: https://kinow.github.io/opennlp-site/. > > So if we have a new repository like github.com/apache/jena-site, then I could > fork it under github.com/kinow/jena-site, work in my own fork, prepare pull > requests, and include a link like https://kinow.github.io/jena-site. I prefer > this approach to having to `svn commit` to preview in the staging area. > > >> A project can have more then one git repo so I guess we can choose whether >> to use the main repo or not. Our site .svn is 2.2G (probably all those >> javadoc changes). Or a separate repo git-include-submodule in the main one? > > Oh, very good point. OpenNLP has/had the same issue. Not sure if that was > fixed. Their old docs are served here: > http://opennlp.apache.org/docs/legacy.html > > I believe it's done here: > https://github.com/apache/opennlp-site/blob/0303866c56689f602dc9258b32e1a64f59ea82e4/pom.xml#L204 > > Though not entirely sure how it works. I can join the Slack channel next week > and check with them. The first version of the site included all the old > javadocs, and was quite slow to check out and build. > > There was some service interruption during the Apache Infra automation > set-up. But given OpenNLP just went through the process, it would be simpler, > as we could just tell them to look at the job and instead of Maven/JBake, run > jekyll or whatever tool we choose. I would be happy to volunteer and create > ticket to create jena-site repository in GitHub. Then once we have the site > being generated there and we have validated it, I can create the ticket for > INFRA to set up the automation, and switch from svnpubsub to gitpubsub. > > > Cheers > Bruno > > > > > From: Andy Seaborne > To: dev@jena.apache.org > Sent: Sunday, 12 November 2017 4:56 AM > Subject: gitpubsub > > > > > On 09/11/17 20:51, Bruno P. Kinoshita wrote: > ... >> However, I'm +1 for moving our site to Git. > > What changes if we go for gitpubsub? > > All I know about it is the bullet point on > https://www.apache.org/dev/project-site.html. > > If I read that right, no CMS because CMS is svnpubsub only. Is it a > "big bang" switch to Jekyll? That isn't too scary but it is a step-change. > > One thing we do benefit from currently is content fixes via CMS - we may > have to change that. I guess there is no jena.staging.a.o? It becomes > local Jekyll build? > > A project can have more then one git repo so I guess we can choose > whether to use the main repo or not. Our site .svn is 2.2G (probably > all those javadoc changes). Or a separate repo git-include-submodule in > the main one? > > Andy
[GitHub] jena issue #312: Use top POM as parent.
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/312 Agreed, with SVN you deal in versions and there is a fairly natural mapping to modules, in DVC like git you deal with deltas and the module boundaries aren't as useful a way to organize change management. We can always go to full on OSGi and run everything through dynamic services for full module decoupling! :stuck_out_tongue_winking_eye: ---
[GitHub] jena issue #312: Use top POM as parent.
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/312 I agree that Jena doesn't (and shouldn't have) a monolithic build, but do we want individual modules to be build-able separately? I'm not sure what the use case for that is... ---
Re: Generic RDFVisitor
Perhaps you can say a little more about your use case here? I think we could probably work something out for this feature, but I am curious about why you are reaching for the visitor pattern? ajs6f > On Nov 17, 2017, at 11:27 AM, Adam Jacobs wrote: > > Perhaps only a single generic parameter then, if each method should return > the same type. > Or a sub-interface in which all three parameters are the same, the way that > Java's `UnaryOperator` is related to `Function`. > > > ____ > From: ajs6f > Sent: Friday, November 17, 2017 10:01 AM > To: dev@jena.apache.org > Subject: Re: Generic RDFVisitor > > Not sure how that would play against: > > Object org.apache.jena.rdf.model.impl.ResourceImpl.visitWith(RDFVisitor) > > OTOH, I'm not sure how much use the visitor pattern there has ever really > gotten... > > ajs6f > >> On Nov 17, 2017, at 10:55 AM, Adam Jacobs wrote: >> >> I wonder if it would be useful to generify the `RDFVisitor` interface... >> >> public interface RDFVisitor { >> >> B visitBlank( Resource r, AnonId id ); >> U visitURI( Resource r, String uri ); >> L visitLiteral( Literal l ); >> >> } >
[GitHub] jena issue #312: Use top POM as parent.
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/312 You got me with that dry English sense of humor. :wink: > No rush to make this change but aiming to change once would be better, especially if across a release. Good point. Let's do this gracefully instead of spastically. We could look at some other ways to slice verbiage out of the top POM, although I admit I can't think of anything that would take as large a slice as `dependencyManagement`. ---
[GitHub] jena issue #312: Use top POM as parent.
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/312 What? "standard techniques" is an ordinary Maven BOM. ---
Re: Generic RDFVisitor
Not sure how that would play against: Object org.apache.jena.rdf.model.impl.ResourceImpl.visitWith(RDFVisitor) OTOH, I'm not sure how much use the visitor pattern there has ever really gotten... ajs6f > On Nov 17, 2017, at 10:55 AM, Adam Jacobs wrote: > > I wonder if it would be useful to generify the `RDFVisitor` interface... > > public interface RDFVisitor { > >B visitBlank( Resource r, AnonId id ); >U visitURI( Resource r, String uri ); >L visitLiteral( Literal l ); > > }
Re: jena-project
I'm basically +1 to this-- jena-project was always confusing at best. In theory, we could factor out some of those 932 lines with a Jena Maven BOM. Actually, that might be nice for integrators and those using apache-jena-lib. ajs6f > On Nov 17, 2017, at 10:12 AM, Andy Seaborne wrote: > > When we moved to one version for all modules, pressure of time pushed us to > have jena-project as a copy of the old jena-parent. > > Do we want to go the next step forward which is to merge jena-project into > the top POM and drop the jena-project module? > > It turns out to be quite easy to do. > > PR for discussion: > https://github.com/apache/jena/pull/312 > > It does make the top POM quite large - 932 lines. > > Thoughts? > >Andy
Jira and Gitbox integration?
Hi, INFRA-- Here at Jena we are considering moving our Apache git <-> Github mirroring to accept changes at Github and mirror them to Apache git (currently it's the other way around). But right now we have some nice Jira integrations, and so we have some questions about how that would work if we reversed the mirroring. Currently, any mention of a Jira ticket (e.g. "I think this could affect JENA-1234") in a Github PR automatically copies the conversation for that PR over to the comments in that Jira ticket. Will we be able to keep that integration if we reverse the mirroring? Github treats issues/tickets and PRs very similarly-- is it possible to integrate Jira in a similar way so that a PR that doesn't mention an extant particular Jira ticket automatically files a new Jira ticket? Thanks for any info and all that you already do for us! ajs6f
[GitHub] jena pull request #306: Algorithms for JENA-1414
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/306#discussion_r151692738 --- Diff: jena-base/src/main/java/org/apache/jena/atlas/iterator/Iter.java --- @@ -351,6 +351,22 @@ public void remove() { return filter(iter, Objects::nonNull) ; } +/** Step forward up to {@code steps} places. + * Return number of steps taken. --- End diff -- `@return number of steps taken` ---
Re: TDB2 testing Re: TDB2 merged
> Adding a template name to the HTTP API would be good but IMO it's a long way > off to provide UI access. TDB1 works for people. This is true, but if we can give people an easy way to create TDB2 dbs and compare them apples-to-apples in their own systems, we will get more feedback more quickly. That having been said, I honestly do not know anything about how the Fuseki UI is coded. Is it done with a well-known template library? ajs6f > On Nov 16, 2017, at 3:02 PM, Andy Seaborne wrote: > > > > On 27/10/17 11:44, Osma Suominen wrote: >> Hi, >> As I've promised earlier I took TDB2 for a little test drive, using the >> 3.5.0rc1 builds. >> I tested two scenarios: A server running Fuseki, and command line tools >> operating directly on a database directory. >> 1. Server running Fuseki >> First the server (running as a VM). Currently I've been using Fuseki with >> HDT support, from the hdt-java repository. I'm serving a dataset of about >> 39M triples, which occasionally changes (eventually this will be updated >> once per month, or perhaps more frequently, even once per day). With HDT, I >> can simply rebuild the HDT file (less than 10 minutes) and then restart >> Fuseki. Downtime for the endpoint is only a few seconds. But I'm worried >> about the state of the hdt-java project, it is not being actively maintained >> and it's still based on Fuseki1. > > You don't need to use their Fuseki integration. > >> So I switched (for now) to Fuseki2 with TDB2. It was rather smooth thanks to >> the documentation that Andy provided. I usually create Fuseki2 datasets via >> the API (using curl), but I noticed that, like the UI, the API only supports >> "mem" and "tdb". So I created a "tdb" dataset first, then edited the >> configuration file so it uses tdb2 instead. >> Loading the data took about 17 minutes. I used wget for this, per Andy's >> example. This is a bit slower than regenerating the HDT, but acceptable >> since I'm only doing it occasionally. I also tested executing queries while >> reloading the data. This seemed to work OK even though performance obviously >> did suffer. But at least the endpoint remained up. >> The TDB2 directory ended up at 4.6GB. In contrast, the HDT file + index for >> the same data is 560MB. >> I reloaded the same data, and the TDB2 directory grew to 8.5GB, almost twice >> its original size. I understand that the TDB2 needs to be compacted >> regularly, otherwise it will keep growing. I'm OK with the large disk space >> usage if it's constant, not growing over time like TDB1. >> 2. Command line tools >> For this I used an older version of the same dataset with 30M triples, the >> same one I used for my HDT vs TDB comparison that I posted on the users >> mailing list: >> http://mail-archives.apache.org/mod_mbox/jena-users/201704.mbox/%3C90c0130b-244d-f0a7-03d3-83b47564c990%40iki.fi%3E >> This was on my i3-2330M laptop with 8GB RAM and SSD. > > Thank you for the figures. > >> Loading the data using tdb2.tdbloader took about 18 minutes (about 28k >> triples per second). The TDB2 directory is 3.7GB. In contrast, using >> tdbloader2, loading took 11 minutes and the TDB directory was 2.7GB. So TDB2 >> is slower to load and takes more disk space than TDB. > > Those are low figures for 40M. Lack of free RAM? (It's more acute with TDB2 > ATM as it does random I/O.) RDF syntax? A lot of long literals? > > Today: TDB2: > > INFO Finished: 50,005,630 bsbm-50m.nt.gz 738.81s (Avg: 67,684) > > >> I ran the same example query I used before on the TDB2. The first time was >> slow (33 seconds), but subsequent queries took 16.1-18.0 seconds. >> I also re-ran the same query on TDB using tdbquery on Jena 3.5.0rc1. The >> query took 13.7-14.0 seconds after the first run (24 seconds). >> I also reloaded the same data to the TDB2 to see the effect. Reloading took >> 11 minutes and the database grew to 5.7GB. Then I compacted it using >> tdb2.tdbcompact. Compacting took 18 minutes and the disk usage just grew >> further, to 9.7GB. The database directory then contained both Data-0001 and >> Data-0002 directories. I removed Data-0001 and disk usage fell to 4.0GB. Not >> quite the same as the original 3.7GB, but close. >> My impressions so far: It works, but it's slower than TDB and needs more >> disk space. Compaction seems to work, but initially it will just increase >> disk usage. The stale data has to be manually removed to actually reclaim >> any space. > >
Re: Issues fixed in Apache Jena
One possibility: Jena does not (and I assume never did) enforce a "squash-before-merging" policy. That is to say, if I write a PR with ten commits, and it is approved, and we merge it, it will normally go in as all ten commits. Some projects demand that such a PR be "squashed" (all ten commits be reduced into one with the sum of changes present) before merging. If that is part of the difference, I suppose it should show up in the same way as a difference between Jena and other projects in the number of commits per time unit in the main branch. ajs6f > On Nov 16, 2017, at 7:55 AM, Γεώργιος Δίγκας wrote: > > Dear All, > > I would like to thank you for your replies! > >>> What is a single issue in your context? > SonarQube uses a set of coding > rules<https://docs.sonarqube.org/display/SONAR/Issues> in order to measure > the TD. While running an analysis, it raises an issue every time a piece of > code breaks a coding rule. >>> I think what is being counted is any issue that SonarQube TD reports, and >>> this is being done on every single commit and summed together. This doesn’t >>> seem like a particularly meaningful statistic since you would inevitably >>> count the same issue N times where N is the number of commits between where >>> an issue was introduced and where it was fixed. It seems like there should >>> really be some attempts to perform de-duplication. > The number refers to unique issues and it does not include any duplication. > (If one issue was fixed and then after some time the same issue appeared in > the same piece of code I count as new). >>> It also sounds like it doesn’t make any attempts to account for common >>> development practices i.e. New code often develops over a series of commits >>> with developers implementing outlines first and then refining and cleaning >>> up a feature and cleaning up a feature as it matures. > I totally agree with the last sentence. As I said on my previous e-mails the > cleaning up rate on your project is the highest among the Apache projects > that I analyzed and I am wondering why is that. What practices do you follow? > Is it a coincidence? >>> There are many (, many) minor things and they outweigh the major problems. >>> Calling them all "issues" gives them equal weight. Some are about >>> canonicalization of the code. > I have updated the previously sent spreadsheet > (https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing). > Now on the sheet: Jena: Open Issues - October 7, 2017 I have added the > Severity and the Type of each issue and you can filter them based on these > two criteria (they are based on SoanrQube's default classification). >>> NB the "issue" word has a specific meaning for JIRA which a lot of Apache >>> projects use. Jena's current total, now, is 1424. > Thank you for the clarification. I should had mention in my first e-mail that > I refer to SonarQube's Issues and to to Jira. > > With kind regards, > > George Digkas > > From: Andy Seaborne > Sent: Thursday, November 16, 2017 12:55 PM > To: Γεώργιος Δίγκας; dev@jena.apache.org > Subject: Re: Issues fixed in Apache Jena > > Do not take git as complete! > > Jena started in 2000. > https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html > > Jena 2.0 was released 2003-08-28. > A whole 40M including dependencies! A 14.7M zip file! > https://sourceforge.net/projects/jena/files/ > > The whole of SF SVN history was imported by the Apache infrastructure > team (a herculean effort) into Apache SVN. I don't know how to get to it > from git, it may not be there and only in SVN. > > The earliest git root commit is for the move to Apache from SF > [4298106f1e], 6 years ago. (There are 4 root commits due to merges) > > --- > > It's an interesting start and to make the analysis usefully inform the > reader as to the state of the project I suggest treating different kinds > of issues different, not uniformly important. > > There are many (, many) minor things and they outweigh the major > problems. Calling them all "issues" gives them equal weight. Some are > about canonicalization of the code. > > Yet reformatting the whole code base (if practical, which it arguable) > then greatly decreases the usefulness of git history. That would be a > huge loss. > > (NB the "issue" word has a specific meaning for JIRA which a lot of > Apache projects use. Jena's current total, now, is 1424.) > > Andy > >> >> Thank you in advance! >> >> >> With kind regards, >> >> George Digkas
Re: Immutability
I think this is one reason that Clerezza introduced a new RDF API: http://clerezza.apache.org/apidocs/org/apache/clerezza/rdf/core/package-summary.html So it seems to me that if we want to introduce immutable types, we might want to do that in the context of a completely new API. The use of the Java 8 Streams API is also something that has been mooted as something that might merit a new Jena API. (Instead of mixing things up in the current one.) I'm not sure how that plays out with ARQ, though. We would want people to be able to use the new types with ARQ without much difficulty. ajs6f Claude Warren wrote on 11/14/17 2:43 AM: In most cases I prefer immutable interfaces. However, immutable interfaces pose an interesting problem for contract testing and for the permissions implementation. In contract testing you get have a producer to create instances of an interface and tests you run against it. However, since you don't have any setters to call on the instance you can not know what the result of any particular getter should be. The only choices that I see for this case are: 1. Don't test the immutable interface separately and therefore miss some implementations. That is only test the immutable interface when paired with the mutable interface. 2. Modify the producer interface so that the producer will create the data necessary to execute the tests. This results in complicated producers. Keeping in mind that contract tests allow us to write tests for the Graph interface and then create very simple implementations of suites for each implementation on Graph. This means that when we add ad method or detect an incorrectly implemented method we can modify the contract test and all implementations are then properly tested. In the permissions implementation we wrap the interfaces with dynamic proxies that intercept calls to verify if the user has permission to make those calls before execution, wrap the results with "secured" versions, and in some case filter results (e.g. iterators). This system will be perfectly happy running against immutable interfaces. The interesting part is that the system can take mutable objects and return objects that throw exceptions when the user does not have access (much the same as the current read only implementations do). But you can not know *a priori* which methods will throw exceptions. This leads me to one more observation. When building the permissions layer I learned that simple objects like RDFNode can return complex objects like Model. I believe that an immutable model would have to return an immutable RDFNode. The signature of the Immutable RDFNode should indicate a return of an Immutable Model. But to be a drop in replacement for a standard RDFNode it will need to return a Model. Classes like RDF lists also pose interesting problems. So while I like immutable interfaces in general, I think that back fitting them here is problematic. Scan through the permissions layer for some idea of the complexity. Having written all of this I think I have come to believe that low level tools like Jena or data stores in general, should not have immutable interfaces. Immutable interfaces belong at a slightly higher architectural level or at the extreme boundary of the project. For example if Jena had a webservice API that retrieved Models and such then it might makes sense for the deserialized versions to be immutable. Claude On Tue, Nov 14, 2017 at 12:12 AM, Adam Jacobs wrote: The subject of immutability was raised in JENA-1391 ( https://issues.apache.org/jira/browse/JENA-1391). Specifically, the `getUnionModel` method in Jena 3.4 returns an immutable model view, and the implementation of the aforementioned story includes methods that will return an immutable dataset view. The question is whether these immutable views deserve their own interfaces. Currently, the views are returned using what I called "unexpected immutability" because they implement mutable interfaces. This introduces the potential for `UnsupportedOperationException`s. Unfortunately, that (degenerate) pattern is used in Java's `Collections` utility as well (https://docs.oracle.com/javase/8/docs/api/java/util/ Collections.html) but Scala is a clean example to draw inspiration from: by implementing immutable interfaces as parents to their mutable counterparts (rather than vice verse) we can satisfy the Liskov Substitution Principle. Obviously, implementing that solution is easier to do from scratch than in an existing code base; but I imagine it could be done in multiple phases, by introducing the new interfaces and using them in new methods (with easy conversion to mutability via union) while gradually retrofitting older methods. The question then, is whether such a change is worthwhile...
Re: Issues fixed in Apache Jena
It's not really clear to me how to answer these question without more context. How did you go about making these calculations? What span of time does your analysis concern? What are you counting as technical debt (anything that SonarQube claims is "technical debt")? Are you comparing Jena to other projects with a similar lifespan? Are you comparing Jena to projects that have a similar contribution history? etc. etc. ajs6f ?? wrote on 11/15/17 10:15 AM: Dear developers, I am a PhD student in the university of Groningen and the topic of my PhD is the evolution of Technical Debt (TD) in open-source development. I have analyzed some projects from the Apache Foundation (using SonarQube) and I realized that your project has a tremendous number (405,700) of fixed issues, when we compare it to other projects from Apache. I would like to ask you the following 3 questions: 1. Why had been introduced so many issues of TD into your project? 2. The fixing of those issues was in purpose or a coincidence? 3. Do you use SonarQube (or SonarLint) in order to detect and fix the issues? Thank you in advance! With kind regards, George Digkas
[GitHub] jena issue #307: JENA-1418: Upgrade some versions
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/307 ð the Commons Lang jump is more complicated because of the issues around ISO date format. ---
Re: Gitbox?
Daniel-- We have begun to discuss this on dev@jena, and one question that immediately came up is how this plays with JIRA integration. Currently we have a system in which any mention of any extant JIRA ticket in a Github PR starts copying the comments in that PR over to that JIRA ticket, which is useful. Can we assume that the same integration will work the same way if we go to "Github as canonical"? Are there any further integrations available after choosing "Github as canonical", e.g. create-JIRA-ticket-on-PR or the like? Thanks for info! ajs6f Daniel Pono Takamori wrote on 11/9/17 1:22 PM: Gitbox would indeed allow you to have the Github tools available to committers. It treats Github as the canonical source (we also keep a copy on Gitbox), which allows the PRs and issues to be a bit more convenient (there are still some things we can't support due to the Github's coarse permission structure). We require all committers to use Github's 2FA [0] so once you have a taken a vote in the project, file a ticket on the INFRA JIRA [1] and then your committers can run through the Gitbox syncing [2] to matchup ASF IDs and Github IDs. Let us know if you have any other questions. [0] - https://help.github.com/articles/providing-your-2fa-authentication-code/ [1] - https://issues.apache.org/jira/browse/INFRA [2] - https://gitbox.apache.org/setup/ On Thu, Nov 9, 2017 at 10:50 AM, wrote: Hi, I'm a committer for Jena. Recently, we had some discussion about our source management and there was some uncertainty about how we can arrange the relationship between Github and Apache git. Currently, commits go against Apache git, and Github picks them up and mirrors them, which is a bit annoying in that it's not possible to use the Github PR review machinery transparently. This is not a big deal, but is it in fact possible to do that? In other words, is it possible to (e.g.) merge PRs at Github and have Apache git pick up the change? I went to https://gitbox.apache.org/setup/ and linked my accounts, but that didn't seem to do anything... Thanks for any info! -- ajs6f
Re: Gitbox?
Yes, I'm on this! First question, what is a [DISCUSS]? :) I assume you are talking about a dev@ thread with that label to discuss this possibility, but is there more to it than that? As I understand it now, "it" is changing from our current setup, in which we act against Apache git and the results are mirrored to Github, to the opposite direction, in which we act against Github and the results are mirrored to Apache git. As far as PR machinery goes, an advantage that I see is that we will be able to use the complete excellent web UI at Github. Right now, we can comment, review, etc, but when it comes time to merge, it occurs via CLI. It's true that that isn't the end of the world, but is both clunky (as Andy notes) and error-prone (I had a annoying problem with it on a recent PR). We would also get accurate results from Github's visualization tools, which isn't a major thing, but could be nice. My understanding is that such a change will have no effect at all on our current JIRA integration, but I will get that confirmed (or disproved!) by INFRA. Going to Github issues would be a different choice, and I am not arguing for that now. (Trying to split as much off of this as possible to keep the decision simpler!) Bruno-- if it's not obvious, I am intentionally splitting off the question of where we maintain our site, which is a really good thing to discuss, but think it is orthogonal. ajs6f Andy Seaborne wrote on 11/11/17 10:41 AM: On 09/11/17 18:24, aj...@apache.org wrote: Great, thanks! So folks, is there interest in pursuing this rearrangement of our source management? I would certainly vote for it. Great - do you want to take this on and see us through the process? We'll need a [DISCUSS] I expect to make sure we all know what the VOTE is really about. I'm not clear what "it" is exactly. I'm all linked up - 2FA etc, and it can see which repo I have access to. I use JIRA search and git history more and more to understand what users are asking about and to trace down bugs. How does gitbox interact with JIRA? Do we get a nice set of JIRA comment (obvious existing bug - JIRA isn't markdown so `` and {{}} mess up. I guess discussion is on GH issues, not JIRA tickets, which is a change but fine by me. The "git pull github pull/XXX/head --no-ff ; git push" is clunky but not the end of the world. (Unrelated wish - delete of the local branch - is that "git fetch --prune"?). it's not possible to use the Github PR review machinery transparently This confused me - we use GH PR review at the monent. Could you expand the point? So I guess I want to know what changes in the workflows for submission (nothing presumably but what about JIRA? Auto create-JIRA-on-PR would be fancy) and for acceptance I'm expecting changes if we go "gitbox" and that's fine, I'm not arguing for the status quo. It's not easy to reverse so I want to know what it is. Andy https://github.com/apache/jena/blob/master/CONTRIBUTING.md ajs6f Daniel Pono Takamori wrote on 11/9/17 1:22 PM: Gitbox would indeed allow you to have the Github tools available to committers. It treats Github as the canonical source (we also keep a copy on Gitbox), which allows the PRs and issues to be a bit more convenient (there are still some things we can't support due to the Github's coarse permission structure). We require all committers to use Github's 2FA [0] so once you have a taken a vote in the project, file a ticket on the INFRA JIRA [1] and then your committers can run through the Gitbox syncing [2] to matchup ASF IDs and Github IDs. Let us know if you have any other questions. [0] - https://help.github.com/articles/providing-your-2fa-authentication-code/ [1] - https://issues.apache.org/jira/browse/INFRA [2] - https://gitbox.apache.org/setup/ On Thu, Nov 9, 2017 at 10:50 AM, wrote: Hi, I'm a committer for Jena. Recently, we had some discussion about our source management and there was some uncertainty about how we can arrange the relationship between Github and Apache git. Currently, commits go against Apache git, and Github picks them up and mirrors them, which is a bit annoying in that it's not possible to use the Github PR review machinery transparently. This is not a big deal, but is it in fact possible to do that? In other words, is it possible to (e.g.) merge PRs at Github and have Apache git pick up the change? I went to https://gitbox.apache.org/setup/ and linked my accounts, but that didn't seem to do anything... Thanks for any info! -- ajs6f
Re: Gitbox?
Great, thanks! So folks, is there interest in pursuing this rearrangement of our source management? I would certainly vote for it. ajs6f Daniel Pono Takamori wrote on 11/9/17 1:22 PM: Gitbox would indeed allow you to have the Github tools available to committers. It treats Github as the canonical source (we also keep a copy on Gitbox), which allows the PRs and issues to be a bit more convenient (there are still some things we can't support due to the Github's coarse permission structure). We require all committers to use Github's 2FA [0] so once you have a taken a vote in the project, file a ticket on the INFRA JIRA [1] and then your committers can run through the Gitbox syncing [2] to matchup ASF IDs and Github IDs. Let us know if you have any other questions. [0] - https://help.github.com/articles/providing-your-2fa-authentication-code/ [1] - https://issues.apache.org/jira/browse/INFRA [2] - https://gitbox.apache.org/setup/ On Thu, Nov 9, 2017 at 10:50 AM, wrote: Hi, I'm a committer for Jena. Recently, we had some discussion about our source management and there was some uncertainty about how we can arrange the relationship between Github and Apache git. Currently, commits go against Apache git, and Github picks them up and mirrors them, which is a bit annoying in that it's not possible to use the Github PR review machinery transparently. This is not a big deal, but is it in fact possible to do that? In other words, is it possible to (e.g.) merge PRs at Github and have Apache git pick up the change? I went to https://gitbox.apache.org/setup/ and linked my accounts, but that didn't seem to do anything... Thanks for any info! -- ajs6f
[GitHub] jena pull request #304: JENA-1418: Upgrading minor dependencies and plugins
GitHub user ajs6f opened a pull request: https://github.com/apache/jena/pull/304 JENA-1418: Upgrading minor dependencies and plugins You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajs6f/jena JENA-1418 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/304.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #304 commit 548770743dff4dae6a26ae40c8d1abaf613a9daf Author: ajs6f Date: 2017-11-04T14:49:41Z Upgrading commons-io and commons-codec commit 75d17cdbb2b788e5fc270226863cb0dd0a911c83 Author: ajs6f Date: 2017-11-04T14:53:20Z Upgrading commons-csv commit 589fe44ada8a78985af5389efc2593017b51eb61 Author: ajs6f Date: 2017-11-04T15:00:18Z Upgrading Log4j2 commit dd5996150add112587d4a31fd56dd062be21f160 Author: ajs6f Date: 2017-11-04T15:05:17Z Upgrading contract test dependencies commit 9f93514b78099076257836e754a63253ab71a3e0 Author: ajs6f Date: 2017-11-04T15:22:30Z Maven plugin updates commit 6c100d92be885f13c31c2674ead85ba2a7f2b383 Author: ajs6f Date: 2017-11-04T15:29:53Z Moving Shiro dependency management commit 56f2f6bd7641a408a9821d690d0f961f03ff796d Author: ajs6f Date: 2017-11-04T15:47:09Z Upgrading Shiro ---
[GitHub] jena pull request #289: Version bumps for 3.5
Github user ajs6f closed the pull request at: https://github.com/apache/jena/pull/289 ---
[GitHub] jena issue #289: Version bumps for 3.5
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/289 I'm closing this PR because for such a small changeset, it's not worth figuring out how my delta gots screwed up with the tabs in the `pom.xml`. I'll just open a fresh PR in a day or three, for JENA-1418. ---
[GitHub] jena issue #303: JENA-1408: Quicker -Pdev; simplify profiles.
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/303 What's here looks good, but I don't see where `jena-iri` and `jena-shaded-guava` are being added in? ---
Re: [VOTE] Release Apache Jena 3.5.0 (RC2)
Yeah, I was somehow missing your key. Weird, I thought I had imported it a long time ago. Oh, well, all good on that front. +1 to the release. ajs6f Andy Seaborne wrote on 10/30/17 10:34 AM: On 30/10/17 14:04, aj...@apache.org wrote: I got a clean build with Mac OS X, Maven 3.5.0, Java version: 1.8.0_65, vendor: Oracle Corporation. However, when checking the sigs, I'm getting: ➜ /tmp gpg --verify apache-jena-3.5.0.tar.gz.asc apache-jena-3.5.0.tar.gz gpg: Signature made Mon Oct 30 05:47:51 2017 EDT gpg:using RSA key 04C95136D236A58F gpg: Can't check signature: No public key And I can't find a sig with that string in the MIT keyserver... Andy, did you change keys recently? Not for a while. Search for "seaborne" and I see pub 4096R/D236A58F 2016-11-04 Andy Seaborne (Code signing key) and link to: https://pgp.mit.edu/pks/lookup?op=get&search=0x04C95136D236A58F and the public key is in the KEYS file. gpg --import KEYS pgp < KEYS Andy ajs6f Osma Suominen wrote on 10/30/17 8:50 AM: Thanks for preparing the second RC Andy! Excellent work, and very timely, considering that several problems were found with the RC1 build late last week and you got all fixes integrated already! I tried to build RC2 on two different Ubuntu 16.04 machines. They have slightly different Java and Maven versions. On one machine the build (using "mvn clean install") went fine. Maven 3.3.9, Java 1.8.0_131 (OpenJDK). On the other (Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T18:41:47+02:00), Java 1.8.0_151 / Oracle) I first got this: [INFO] BUILD FAILURE [INFO] [INFO] Total time: 07:48 min [INFO] Finished at: 2017-10-30T13:53:07+02:00 [INFO] Final Memory: 101M/829M [INFO] [ERROR] Failed to execute goal com.github.alexcojocaru:elasticsearch-maven-plugin:5.2:runforked (start-elasticsearch) on project jena-text-es: Condition returned by method "waitToStart" in class com.github.alexcojocaru.mojo.elasticsearch.v2.client.Monitor was not fulfilled within 30 seconds. -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :jena-text-es [INFO] Stopping the Elasticsearch process at application shutdown ... [INFO] ... the Elasticsearch process has stopped. Exit code: 143 [INFO] Elasticsearch [0] stopped with exit code 143 So apparently Elasticsearch didn't start properly for the jena-text-es integration tests. I happen to have Elasticsearch running on this machine but IIRC the tests should use a non-standard TCP port, so the two Elasticsearch instances shouldn't interfer with each other. I just resumed the build without doing anything else, and it worked the second time, so maybe it was just a random transient error. But then I got this: [INFO] BUILD FAILURE [INFO] [INFO] Total time: 02:20 min [INFO] Finished at: 2017-10-30T14:25:55+02:00 [INFO] Final Memory: 89M/788M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.6:single (create-zip-assembly) on project apache-jena-fuseki: Execution create-zip-assembly of goal org.apache.maven.plugins:maven-assembly-plugin:2.6:single failed: group id '300' is too big ( > 2097151 ). Use STAR or POSIX extensions to overcome this limit -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :apache-jena-fuseki [INFO] The Elasticsearch process has already stopped. Nothing to clean up According to the error above, as well as [1], this can be fixed by using POSIX tar format. I think this happens because on this machine (administered by my employer, University of Helsinki, with LDAP authentication) my user account has a high group ID (300), while the other machine is a personal laptop where I've installed Ubuntu myself and my group ID is 1000. I can
Re: [] Release Apache Jena 3.5.0 (RC2)
I got a clean build with Mac OS X, Maven 3.5.0, Java version: 1.8.0_65, vendor: Oracle Corporation. However, when checking the sigs, I'm getting: ➜ /tmp gpg --verify apache-jena-3.5.0.tar.gz.asc apache-jena-3.5.0.tar.gz gpg: Signature made Mon Oct 30 05:47:51 2017 EDT gpg:using RSA key 04C95136D236A58F gpg: Can't check signature: No public key And I can't find a sig with that string in the MIT keyserver... Andy, did you change keys recently? ajs6f Osma Suominen wrote on 10/30/17 8:50 AM: Thanks for preparing the second RC Andy! Excellent work, and very timely, considering that several problems were found with the RC1 build late last week and you got all fixes integrated already! I tried to build RC2 on two different Ubuntu 16.04 machines. They have slightly different Java and Maven versions. On one machine the build (using "mvn clean install") went fine. Maven 3.3.9, Java 1.8.0_131 (OpenJDK). On the other (Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T18:41:47+02:00), Java 1.8.0_151 / Oracle) I first got this: [INFO] BUILD FAILURE [INFO] [INFO] Total time: 07:48 min [INFO] Finished at: 2017-10-30T13:53:07+02:00 [INFO] Final Memory: 101M/829M [INFO] [ERROR] Failed to execute goal com.github.alexcojocaru:elasticsearch-maven-plugin:5.2:runforked (start-elasticsearch) on project jena-text-es: Condition returned by method "waitToStart" in class com.github.alexcojocaru.mojo.elasticsearch.v2.client.Monitor was not fulfilled within 30 seconds. -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :jena-text-es [INFO] Stopping the Elasticsearch process at application shutdown ... [INFO] ... the Elasticsearch process has stopped. Exit code: 143 [INFO] Elasticsearch [0] stopped with exit code 143 So apparently Elasticsearch didn't start properly for the jena-text-es integration tests. I happen to have Elasticsearch running on this machine but IIRC the tests should use a non-standard TCP port, so the two Elasticsearch instances shouldn't interfer with each other. I just resumed the build without doing anything else, and it worked the second time, so maybe it was just a random transient error. But then I got this: [INFO] BUILD FAILURE [INFO] [INFO] Total time: 02:20 min [INFO] Finished at: 2017-10-30T14:25:55+02:00 [INFO] Final Memory: 89M/788M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.6:single (create-zip-assembly) on project apache-jena-fuseki: Execution create-zip-assembly of goal org.apache.maven.plugins:maven-assembly-plugin:2.6:single failed: group id '300' is too big ( > 2097151 ). Use STAR or POSIX extensions to overcome this limit -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :apache-jena-fuseki [INFO] The Elasticsearch process has already stopped. Nothing to clean up According to the error above, as well as [1], this can be fixed by using POSIX tar format. I think this happens because on this machine (administered by my employer, University of Helsinki, with LDAP authentication) my user account has a high group ID (300), while the other machine is a personal laptop where I've installed Ubuntu myself and my group ID is 1000. I can try to fix this in a PR, but I don't think it's release critical. It's probably not a new issue, I just haven't done a full Jena build on this machine, at least not after it was reinstalled some months ago. -Osma [1] https://maven.apache.org/plugins/maven-assembly-plugin/faq.html#tarFileModes Andy Seaborne kirjoitti 30.10.2017 klo 13:20: Hi, Here is a vote on a release of Jena 3.5.0. This is the second proposed candidate for a 3.5.0 release. Note - the deadline is 18:00 UTC on Thursday - not midnight - so th
Re: Jena 3.5.0 RC2 plan
Back from Vienna! Master just built beautifully for me on Mac OSX, from commit 92c793b67dbb4138858106774d57b23418dd4ae5. ajs6f Andy Seaborne wrote on 10/27/17 4:42 PM: Currently on master: All the PRs are integrated (minimal version of #297 - tests are much faster, with a build taking about 12-13 minutes). It's run a couple of times on Apache jenkins as well. If anyone gets the chance to run from master on Windows or OSX, that would be great. TestDatabaseOps should be fixed. TestProcessFileLock on Windows may well be but IMo is not a blocker to a release - its a test setup issue. Andy On 27/10/17 14:53, Andy Seaborne wrote: There are test problems with TestProcessFileLock (on windows, consistently) and TestDatabaseOps (intermittent but seems to like picking on Bruno). There a couple of other small PRs for fixes which look safe as well. PR #294 AdapterFileManager fix (Rob - JENA-1405 can be resolved?) PR #297 Elephas testing speed up (slight discussion ongoing about details) TestDatabaseOps: PR #295 Use @Rule in testing to isolate tests. PR #296 Control the test in disk-touching DBOE modules. TestProcessFileLock: PR #298 Isolate tests witha @Rule and improve/fix lock release. I don't intend to rebuild the javadoc. As you may have noticed, I had some "fun" trying to get it staged. This cutting corners but no public API is changing. If necessary, it can be done after pushing the release out. Master is at version 3.6.0-SNAPSHOT so it goes back. (and switching a unified versions number makes that easier) To smooth the reset, mainly for any of us who have switched to the latest snapshot, I'll apply the PR's, we can test the snapshot, then reset back to 3.5.0-SNAPSHOT just before the RC2 release build. Please don't push to master without also letting dev@ know to make sure I don't miss anything. Good plan? Anything missing? Andy
Re: @Test TestDatabaseOps.compact_prefixes_3 [Was Re: [] Release Apache Jena 3.5.0]
I did notice one warning when running the tests, but most likely unrelated, and expected for some test. 09:18:56 WARN TDB :: Location /home/kinow/Development/java/jena/jena/jena-tdb/target/tdb-testing/DB/ was not locked, if another JVM accessed this location simultaneously data corruption may have occurred I see that all the time. I'm ashamed to admit I've never looked too closely into it. It's never errored me out of a build or appeared to have any consequence. ajs6f Bruno P. Kinoshita wrote on 10/25/17 10:22 PM: Morning Andy, I have access to a Windows machine at work where I can quickly start a build later this week. Thanks for looking into it. Decided to reply e-mails and provide more information before leaving to the office, so that I had another chance at running the tests in this environment where the bugs always happen - i.e. not intermittent in my local workstation from what I can tell. 1/ Could you try running the test in isolation? it should run in Eclipse by pointing at the test and running just that one @Test. Sorry, I should have included this in the previous e-mail. Running the test in isolation in Eclipse works. Running the test in isolation in Maven also works (i.e. mvn clean test install -Dtest=TestDatabaseOps -DfailIfNoTests=no). 2/ Run with mvn -fn (--fail-never) which should make maven run the other >modules and tests so showing if there are any other problems. Managed to wait just until this run of `mvn clean test install -Dmaven.javadoc.skip -fn` passed the TDB2 project. The issue happened, but I couldn't wait for the other tests to run. Gotta rush to the office. I did notice one warning when running the tests, but most likely unrelated, and expected for some test. 09:18:56 WARN TDB :: Location /home/kinow/Development/java/jena/jena/jena-tdb/target/tdb-testing/DB/ was not locked, if another JVM accessed this location simultaneously data corruption may have occurred Going to build the project at work, and send a stack trace (didn't see one last night, but was running the test just before calling a day). CheersBruno From: Andy Seaborne To: dev@jena.apache.org Sent: Thursday, 26 October 2017 6:33 AM Subject: @Test TestDatabaseOps.compact_prefixes_3 [Was Re: [] Release Apache Jena 3.5.0] Bruno, The absence of the Data-0001 is very strange. It is created for every test in the DatabaseMgr.connectDatasetGraph call and other tests work. I don't have access to OSX or Windows for testing and compaction is playing around with directories and files on disk (java, portable, ...) but the code for compaction, and its tests, is quite recent. I did run a build+test with source-release zip file on Linux. A few things to try: 1/ Could you try running the test in isolation? it should run in Eclipse by pointing at the test and running just that one @Test. Is there a stacktrace? 2/ Run with mvn -fn (--fail-never) which should make maven run the other modules and tests so showing if there are any other problems. 3/ I can see that parallel tests would mess it up the test setup. I've just read the surefire plugin docs and I'm still not sure what he default is - it might be thread per core. Can you try forcing no parallel tests please? [*] I can't see a NPE place in line 142, I'm guessing it comes in the Txn.executeRead that follows. g.getPrefixMapping().getNsURIPrefix( and I'll guess that its the second object access (getPrefixMapping()) but it might be deeper - no stack trace? I can't see how that connects to the absence of Data-0001. [*] Module jena-tdb2 does not set up surefire and replies on defaults. It should run TC_TDB org.apache.maven.plugins maven-surefire-report-plugin **/TC_*.java Andy On 25/10/17 12:19, Bruno P. Kinoshita wrote: I think one of the tests is failing when I run `mvn clean test install`, and also when I run the same `mvn clean test -e -X -DforkMode=never` in debug mode in Eclipse. The compact_prefixes_3 test method expects a directory like DB/Data-0001, but there is only a DB folder. The methods to create the Data-0001 switchable location was called, but for some reason nothing happened. Didn't have much time to thoroughly investigate it, so will have to leave the error here for others to take a look. Will have more time to look into it tomorrow evening NZ time. Running org.apache.jena.tdb2.sys.TestDatabaseOps Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.028 sec <<< FAILURE! - in org.apache.jena.tdb2.sys.TestDatabaseOps compact_prefixes_3(org.apache.jena.tdb2.sys.TestDatabaseOps) Time elapsed: 0.053 sec <<< ERROR! java.lang.NullPointerException at org.apache.jena.tdb2.sys.TestDatabaseOps.compact_prefixes_3(TestDatabaseOps.java:142) Running org.apache.jena.tdb2.
Re: [] Release Apache Jena 3.5.0
That sounds good to me. ajs6f Bruno P. Kinoshita wrote on 10/25/17 10:04 PM: Hi Andy, I'm suspecting on either a file not being really deleted by the JVM (i.e. IOX or Commons IO might be failing to do that), or a hidden bug somewhere else, not really in the compaction step. And +1 on not blocking the release. Perhaps we could simply mark that test with a @Ignore + a comment about a known issue, and proceed with the release. CheersBruno From: Andy Seaborne To: dev@jena.apache.org Sent: Thursday, 26 October 2017 5:50 AM Subject: Re: [] Release Apache Jena 3.5.0 Bruno, Thank you for running these tests. == What to do about the 5.3.0 release TDB2 is marked as experimental and I don't know how else to break the deadlock of not getting used for real except by a release. I've hammered as much as I can. The absence of Data-0001 suggests it is a test setup/teardown problem, not a compaction problem. Compaction is currently quite difficult to access (it isn't available live from Fuseki). Options: * Pull 5.3.0 and fix it. Unbounded wait. * Remove TDB2 etc from 5.3.0. * (if possible), identity the problem , then release with a note attached. We can't have one part of Jena blocking the rest - there are still all the incremental improvements and all the contributions to get out. It's a compaction test and compaction does not remove data. The previous version of the database is accessed read-only (with writers locked out, but it is a read transaction). However I'm biased. Andy On 25/10/17 12:19, Bruno P. Kinoshita wrote: I think one of the tests is failing when I run `mvn clean test install`, and also when I run the same `mvn clean test -e -X -DforkMode=never` in debug mode in Eclipse. The compact_prefixes_3 test method expects a directory like DB/Data-0001, but there is only a DB folder. The methods to create the Data-0001 switchable location was called, but for some reason nothing happened. Didn't have much time to thoroughly investigate it, so will have to leave the error here for others to take a look. Will have more time to look into it tomorrow evening NZ time. Running org.apache.jena.tdb2.sys.TestDatabaseOps Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.028 sec <<< FAILURE! - in org.apache.jena.tdb2.sys.TestDatabaseOps compact_prefixes_3(org.apache.jena.tdb2.sys.TestDatabaseOps) Time elapsed: 0.053 sec <<< ERROR! java.lang.NullPointerException at org.apache.jena.tdb2.sys.TestDatabaseOps.compact_prefixes_3(TestDatabaseOps.java:142) Running org.apache.jena.tdb2.sys.TestSys Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in org.apache.jena.tdb2.sys.TestSys Running org.apache.jena.tdb2.sys.TestDatabaseConnection Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.103 sec - in org.apache.jena.tdb2.sys.TestDatabaseConnection Running org.apache.jena.tdb2.assembler.TestTDBAssembler Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.449 sec - in org.apache.jena.tdb2.assembler.TestTDBAssembler Running org.apache.jena.tdb2.TestDatabaseMgr Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.088 sec - in org.apache.jena.tdb2.TestDatabaseMgr Running org.apache.jena.tdb2.solver.TestSolverTDB Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.048 sec - in org.apache.jena.tdb2.solver.TestSolverTDB Running org.apache.jena.tdb2.solver.TestStats Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.013 sec - in org.apache.jena.tdb2.solver.TestStats Results : Tests in error: TestDatabaseOps.compact_prefixes_3:142 » NullPointer Tests run: 537, Failures: 0, Errors: 1, Skipped: 7 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Jena - Project .. SUCCESS [ 1.902 s] [INFO] Apache Jena - Shadowed external libraries .. SUCCESS [ 16.081 s] [INFO] Apache Jena - IRI .. SUCCESS [ 5.970 s] [INFO] Apache Jena - Base Common Environment .. SUCCESS [ 16.289 s] [INFO] Apache Jena - Core . SUCCESS [01:28 min] [INFO] Apache Jena - ARQ (SPARQL 1.1 Query Engine) SUCCESS [01:30 min] [INFO] Apache Jena - RDF Connection ... SUCCESS [ 8.349 s] [INFO] Apache Jena - TDB (Native Triple Store) SUCCESS [ 24.995 s] [INFO] Apache Jena - Database Operation Environment ... SUCCESS [ 0.201 s] [INFO] Apache Jena - DBOE Base SUCCESS [ 9.072 s] [INFO] Apache Jena - DBOE Transactions SUCCESS [ 7.257 s] [INFO] Apache Jena - DBOE Indexes . SUCCESS [ 4.289 s] [INFO] Apache Jena - DBOE Index test suite SUCCESS [ 4.663 s] [INFO] Apache Jena - DBOE Transactional Datastructures SUCCESS [ 11.756 s] [INFO] Ap
Re: [] Release Apache Jena 3.5.0
I don't understand-- I thought the release is _always and only_ the source-- the artifacts are just a convenience we supply...? ajs6f Andy Seaborne wrote on 10/25/17 9:37 PM: On 25/10/17 20:05, aj...@apache.org wrote: Possible option: change the default Maven profile to skip TDB2 for this 3.5.0 release? We buy ourselves some time (at least until a potential 3.5.1 with a stabilized TDB2 test regime) but we still keep TDB2 as available as possible. Thanks for the suggestion - users consume Jena as maven artifacts and these match the release. I'm quite uncomfortable with releasing with the main profile unstable, but I also don't want to block release on a final fix for whatever Bruno has come across, and I also want to get TDB2 out there so that people can start to mess with it (in the good way). I am still in Vienna, so only 50% on at most, but I will try to reproduce Bruno's report. ajs6f Andy Seaborne wrote on 10/25/17 6:50 PM: Bruno, Thank you for running these tests. == What to do about the 5.3.0 release TDB2 is marked as experimental and I don't know how else to break the deadlock of not getting used for real except by a release. I've hammered as much as I can. The absence of Data-0001 suggests it is a test setup/teardown problem, not a compaction problem. Compaction is currently quite difficult to access (it isn't available live from Fuseki). Options: * Pull 5.3.0 and fix it. Unbounded wait. * Remove TDB2 etc from 5.3.0. * (if possible), identity the problem , then release with a note attached. We can't have one part of Jena blocking the rest - there are still all the incremental improvements and all the contributions to get out. It's a compaction test and compaction does not remove data. The previous version of the database is accessed read-only (with writers locked out, but it is a read transaction). However I'm biased. Andy On 25/10/17 12:19, Bruno P. Kinoshita wrote: I think one of the tests is failing when I run `mvn clean test install`, and also when I run the same `mvn clean test -e -X -DforkMode=never` in debug mode in Eclipse. The compact_prefixes_3 test method expects a directory like DB/Data-0001, but there is only a DB folder. The methods to create the Data-0001 switchable location was called, but for some reason nothing happened. Didn't have much time to thoroughly investigate it, so will have to leave the error here for others to take a look. Will have more time to look into it tomorrow evening NZ time. Running org.apache.jena.tdb2.sys.TestDatabaseOps Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.028 sec <<< FAILURE! - in org.apache.jena.tdb2.sys.TestDatabaseOps compact_prefixes_3(org.apache.jena.tdb2.sys.TestDatabaseOps) Time elapsed: 0.053 sec <<< ERROR! java.lang.NullPointerException at org.apache.jena.tdb2.sys.TestDatabaseOps.compact_prefixes_3(TestDatabaseOps.java:142) Running org.apache.jena.tdb2.sys.TestSys Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in org.apache.jena.tdb2.sys.TestSys Running org.apache.jena.tdb2.sys.TestDatabaseConnection Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.103 sec - in org.apache.jena.tdb2.sys.TestDatabaseConnection Running org.apache.jena.tdb2.assembler.TestTDBAssembler Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.449 sec - in org.apache.jena.tdb2.assembler.TestTDBAssembler Running org.apache.jena.tdb2.TestDatabaseMgr Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.088 sec - in org.apache.jena.tdb2.TestDatabaseMgr Running org.apache.jena.tdb2.solver.TestSolverTDB Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.048 sec - in org.apache.jena.tdb2.solver.TestSolverTDB Running org.apache.jena.tdb2.solver.TestStats Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.013 sec - in org.apache.jena.tdb2.solver.TestStats Results : Tests in error: TestDatabaseOps.compact_prefixes_3:142 » NullPointer Tests run: 537, Failures: 0, Errors: 1, Skipped: 7 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Jena - Project .. SUCCESS [ 1.902 s] [INFO] Apache Jena - Shadowed external libraries .. SUCCESS [ 16.081 s] [INFO] Apache Jena - IRI .. SUCCESS [ 5.970 s] [INFO] Apache Jena - Base Common Environment .. SUCCESS [ 16.289 s] [INFO] Apache Jena - Core . SUCCESS [01:28 min] [INFO] Apache Jena - ARQ (SPARQL 1.1 Query Engine) SUCCESS [01:30 min] [INFO] Apache Jena - RDF Connection ... SUCCESS [ 8.349 s] [INFO] Apache Jena - TDB (Native Triple Store) SUCCESS [ 24.995 s] [INFO] Apache Jena - Database Operation Environment ... SUCCESS [ 0.201 s] [INFO] Apache Jena -
Re: @Test TestDatabaseOps.compact_prefixes_3 [Was Re: [] Release Apache Jena 3.5.0]
Just checked out the jena-3.5.0-rc1 tag and ran the complete (`mvn clean install`) build successfully... Mac OS 10.12.6 Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) I didn't see the prob... love these intermittent bugs! ajs6f Andy Seaborne wrote on 10/25/17 9:39 PM: Jena_Development_Test#2766 shows the same test failure. Ubuntu. The changes for that run were unrelated to TDB2. So we have an intermittent failure and it suggests it is the test harness (the test itself is entirely deterministic). Jenkins is a bit unwell at the moment and things are very slow - there is a jena-core failure that occurs as well in an area that has changed in a long while. So Jenkins might trigger the intermittent test situation. ajs6f - You ran tests for the version bump? TDB2 is in the -Pdev profile. Rob - did you run -Pdev? Andy On 25/10/17 18:26, Andy Seaborne wrote: Bruno, The absence of the Data-0001 is very strange. It is created for every test in the DatabaseMgr.connectDatasetGraph call and other tests work. I don't have access to OSX or Windows for testing and compaction is playing around with directories and files on disk (java, portable, ...) but the code for compaction, and its tests, is quite recent. I did run a build+test with source-release zip file on Linux. A few things to try: 1/ Could you try running the test in isolation? it should run in Eclipse by pointing at the test and running just that one @Test. Is there a stacktrace? 2/ Run with mvn -fn (--fail-never) which should make maven run the other modules and tests so showing if there are any other problems. 3/ I can see that parallel tests would mess it up the test setup. I've just read the surefire plugin docs and I'm still not sure what he default is - it might be thread per core. Can you try forcing no parallel tests please? [*] I can't see a NPE place in line 142, I'm guessing it comes in the Txn.executeRead that follows. g.getPrefixMapping().getNsURIPrefix( and I'll guess that its the second object access (getPrefixMapping()) but it might be deeper - no stack trace? I can't see how that connects to the absence of Data-0001. [*] Module jena-tdb2 does not set up surefire and replies on defaults. It should run TC_TDB org.apache.maven.plugins maven-surefire-report-plugin **/TC_*.java Andy On 25/10/17 12:19, Bruno P. Kinoshita wrote: I think one of the tests is failing when I run `mvn clean test install`, and also when I run the same `mvn clean test -e -X -DforkMode=never` in debug mode in Eclipse. The compact_prefixes_3 test method expects a directory like DB/Data-0001, but there is only a DB folder. The methods to create the Data-0001 switchable location was called, but for some reason nothing happened. Didn't have much time to thoroughly investigate it, so will have to leave the error here for others to take a look. Will have more time to look into it tomorrow evening NZ time. Running org.apache.jena.tdb2.sys.TestDatabaseOps Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.028 sec <<< FAILURE! - in org.apache.jena.tdb2.sys.TestDatabaseOps compact_prefixes_3(org.apache.jena.tdb2.sys.TestDatabaseOps) Time elapsed: 0.053 sec <<< ERROR! java.lang.NullPointerException at org.apache.jena.tdb2.sys.TestDatabaseOps.compact_prefixes_3(TestDatabaseOps.java:142) Running org.apache.jena.tdb2.sys.TestSys Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in org.apache.jena.tdb2.sys.TestSys Running org.apache.jena.tdb2.sys.TestDatabaseConnection Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.103 sec - in org.apache.jena.tdb2.sys.TestDatabaseConnection Running org.apache.jena.tdb2.assembler.TestTDBAssembler Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.449 sec - in org.apache.jena.tdb2.assembler.TestTDBAssembler Running org.apache.jena.tdb2.TestDatabaseMgr Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.088 sec - in org.apache.jena.tdb2.TestDatabaseMgr Running org.apache.jena.tdb2.solver.TestSolverTDB Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.048 sec - in org.apache.jena.tdb2.solver.TestSolverTDB Running org.apache.jena.tdb2.solver.TestStats Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.013 sec - in org.apache.jena.tdb2.solver.TestStats Results : Tests in error: TestDatabaseOps.compact_prefixes_3:142 » NullPointer Tests run: 537, Failures: 0, Errors: 1, Skipped: 7 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Jena - Project .. SUCCESS [ 1.902 s] [INFO] Apache Jena - Shadowed external li
Re: [] Release Apache Jena 3.5.0
Possible option: change the default Maven profile to skip TDB2 for this 3.5.0 release? We buy ourselves some time (at least until a potential 3.5.1 with a stabilized TDB2 test regime) but we still keep TDB2 as available as possible. I'm quite uncomfortable with releasing with the main profile unstable, but I also don't want to block release on a final fix for whatever Bruno has come across, and I also want to get TDB2 out there so that people can start to mess with it (in the good way). I am still in Vienna, so only 50% on at most, but I will try to reproduce Bruno's report. ajs6f Andy Seaborne wrote on 10/25/17 6:50 PM: Bruno, Thank you for running these tests. == What to do about the 5.3.0 release TDB2 is marked as experimental and I don't know how else to break the deadlock of not getting used for real except by a release. I've hammered as much as I can. The absence of Data-0001 suggests it is a test setup/teardown problem, not a compaction problem. Compaction is currently quite difficult to access (it isn't available live from Fuseki). Options: * Pull 5.3.0 and fix it. Unbounded wait. * Remove TDB2 etc from 5.3.0. * (if possible), identity the problem , then release with a note attached. We can't have one part of Jena blocking the rest - there are still all the incremental improvements and all the contributions to get out. It's a compaction test and compaction does not remove data. The previous version of the database is accessed read-only (with writers locked out, but it is a read transaction). However I'm biased. Andy On 25/10/17 12:19, Bruno P. Kinoshita wrote: I think one of the tests is failing when I run `mvn clean test install`, and also when I run the same `mvn clean test -e -X -DforkMode=never` in debug mode in Eclipse. The compact_prefixes_3 test method expects a directory like DB/Data-0001, but there is only a DB folder. The methods to create the Data-0001 switchable location was called, but for some reason nothing happened. Didn't have much time to thoroughly investigate it, so will have to leave the error here for others to take a look. Will have more time to look into it tomorrow evening NZ time. Running org.apache.jena.tdb2.sys.TestDatabaseOps Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.028 sec <<< FAILURE! - in org.apache.jena.tdb2.sys.TestDatabaseOps compact_prefixes_3(org.apache.jena.tdb2.sys.TestDatabaseOps) Time elapsed: 0.053 sec <<< ERROR! java.lang.NullPointerException at org.apache.jena.tdb2.sys.TestDatabaseOps.compact_prefixes_3(TestDatabaseOps.java:142) Running org.apache.jena.tdb2.sys.TestSys Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in org.apache.jena.tdb2.sys.TestSys Running org.apache.jena.tdb2.sys.TestDatabaseConnection Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.103 sec - in org.apache.jena.tdb2.sys.TestDatabaseConnection Running org.apache.jena.tdb2.assembler.TestTDBAssembler Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.449 sec - in org.apache.jena.tdb2.assembler.TestTDBAssembler Running org.apache.jena.tdb2.TestDatabaseMgr Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.088 sec - in org.apache.jena.tdb2.TestDatabaseMgr Running org.apache.jena.tdb2.solver.TestSolverTDB Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.048 sec - in org.apache.jena.tdb2.solver.TestSolverTDB Running org.apache.jena.tdb2.solver.TestStats Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.013 sec - in org.apache.jena.tdb2.solver.TestStats Results : Tests in error: TestDatabaseOps.compact_prefixes_3:142 » NullPointer Tests run: 537, Failures: 0, Errors: 1, Skipped: 7 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Jena - Project .. SUCCESS [ 1.902 s] [INFO] Apache Jena - Shadowed external libraries .. SUCCESS [ 16.081 s] [INFO] Apache Jena - IRI .. SUCCESS [ 5.970 s] [INFO] Apache Jena - Base Common Environment .. SUCCESS [ 16.289 s] [INFO] Apache Jena - Core . SUCCESS [01:28 min] [INFO] Apache Jena - ARQ (SPARQL 1.1 Query Engine) SUCCESS [01:30 min] [INFO] Apache Jena - RDF Connection ... SUCCESS [ 8.349 s] [INFO] Apache Jena - TDB (Native Triple Store) SUCCESS [ 24.995 s] [INFO] Apache Jena - Database Operation Environment ... SUCCESS [ 0.201 s] [INFO] Apache Jena - DBOE Base SUCCESS [ 9.072 s] [INFO] Apache Jena - DBOE Transactions SUCCESS [ 7.257 s] [INFO] Apache Jena - DBOE Indexes . SUCCESS [ 4.289 s] [INFO] Apache Jena - DBOE Index test suite SUCCESS [ 4.663 s] [INFO] Apache Jena - DBOE
more benchmarking
https://iswc2017.semanticweb.org/paper-70/ is a paper in the main conference track at ISWC. It is exercising Jena 2.3.0, but I'm not sure if that version number is for Jena/TDB or Fuseki. It is using Java 7, which makes me think they mean Fuseki 2.3.0 (Jena/TDB 3.0.0), which is still considerably out of date. Same story, different day... -- ajs6f
[GitHub] jena pull request #294: Fix and tests for possible NPE (JENA-1405)
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/294#discussion_r146827609 --- Diff: jena-arq/src/main/java/org/apache/jena/riot/adapters/AdapterFileManager.java --- @@ -285,6 +286,12 @@ protected Model readModelWorker(Model model, String filenameOrURI, String baseUR if ( baseURI == null ) baseURI = SysRIOT.chooseBaseIRI(filenameOrURI) ; try(TypedInputStream in = streamManager.openNoMapOrNull(mappedURI)) { +if ( in == null ) +{ +if ( log.isDebugEnabled() ) +log.debug("Failed to locate '"+mappedURI+"'") ; --- End diff -- As I understand it, `log.debug("Failed to locate '{}'", mappedURI) ` [avoids the need](https://www.slf4j.org/faq.html#logging_performance) to explicitly check `isDebugEnabled()`. ---
Re: github stuff Was: [2/2] jena git commit: JENA-1391: adding isEmpty method to Dataset
Super +1 to going to gitpubsub. I am agnostic (because I don't know enough to have a very informed opinion) about site processing tools. (I've had no problem using plain ol' Maven Site processing, but I never used it on very large projects, nothing with a site the size of Jena's.) I had not heard of Jekyll, so I went and looked at https://jekyllrb.com/ but it appears to be a Ruby product? Would we run it somehow via JRuby from within Maven? Or as an exec task? ajs6f Bruno P. Kinoshita wrote on 10/23/17 5:11 AM: Is there a git+CMS option? (or mirro git to SVN then ...) More or less. It was enabled in 2015 https://blogs.apache.org/infra/entry/git_based_websites_available You must have the web site in the asf-site branch. No mirror to SVN as far as I know... CMS may not be around forever, and while the markdown isn't all >standard, it's quite close. (e.g. the processor "Title:" stuff) OpenNLP had a few issues migrating to JBake, but a bit of IDE-fu + regex did the trick. The difference with Jena, I think, is that Jena's documentation is much more extensive, which means more pages to edit. But doable nevertheless. I don't know what processor is behind CMS - python based? Home grown/modified? Home grown, like the old IRC bot/factoid code (written in Lua I think), the Help Wanted app (Lua too I believe). The CMS is a mix of Perl and Python. Source code for the curious: https://svn.apache.org/repos/infra/websites/cms/ I've use jekyll (choice based on choosing a commonly used system to >increase the longevity and stability of the choice). Jekyll is my preferred option as well, as there are heaps more documentation / examples / templates to re-use. The work for OpenNLP is done in https://github.com/apache/opennlp-site, in the master branch. Then, there is a job somewhere, set up by ASF Infra, that pulls the master branch, runs `mvn clean package ...`, and then deploys the resulting static files generated onto the asf-site branch. The site is served with the contents of that asf-site branch. One can build locally pull requests with `mvn clean package ...` and preview the web site, without having to wait and preview in the staging web site. Or even preview in GitHub pages as well. The decision for JBake was already done when I joined, so I just helped with a few issues. It works quite well though. But if ASF Infra is able to build a Jekyll project, then we could use it instead. Bruno From: Andy Seaborne To: dev@jena.apache.org Sent: Monday, 23 October 2017 3:09 AM Subject: Re: github stuff Was: [2/2] jena git commit: JENA-1391: adding isEmpty method to Dataset Is there a git+CMS option? (or mirro git to SVN then ...) CMS may not be around forever, and while the markdown isn't all standard, it's quite close. (e.g. the processor "Title:" stuff) I don't know what processor is behind CMS - python based? Home grown/modified? I've use jekyll (choice based on choosing a commonly used system to increase the longevity and stability of the choice). Andy On 16/10/17 19:51, Bruno P. Kinoshita wrote: A few months ago I helped setting up OpenNLP's new website building from github. The source is in an opennlp-site git repository, and is built with maven using the maven jbake static site generator. Before they were using the svn cms pubsub if I recall correctly. Maybe we could have something similar if others like this approach. CheersBruno Sent from Yahoo Mail on Android On Tue, 17 Oct 2017 at 2:12, aj...@apache.org wrote: Andy Seaborne wrote on 10/13/17 3:40 PM: If anyone is interesting in following it up, I have read that Apache projects can now use gitbox where by all work is on Github, including the full PR cycle, and the ASF is mirrored back. To us, it looks like the GH is the master and ASF the mirror (IIRC its a bit more complicated under the hood for INFRA than that). Andy That sounds good to me. Is this the sort of thing for which I could just file a ticket on INFRA and follow up with them? As long as we are digressing, you know what I would really love? Being able to do our docs/site in git/github. I'm pretty sure other Apache projects manage to do that... ajs6f
GitBox Was: github stuff
It's been surprisingly hard for me to find docs about how GitHub vs. Apache-side git works, but that may be my problem! :grin: I have found: https://gitbox.apache.org/setup/ and linked my accounts. But I see no change in behavior-- I cannot push a test branch to Github. So perhaps we (Jena) need to do something to enable this? ajs6f Andy Seaborne wrote on 10/22/17 4:02 PM: On 16/10/17 14:12, aj...@apache.org wrote: Andy Seaborne wrote on 10/13/17 3:40 PM: If anyone is interesting in following it up, I have read that Apache projects can now use gitbox where by all work is on Github, including the full PR cycle, and the ASF is mirrored back. To us, it looks like the GH is the master and ASF the mirror (IIRC its a bit more complicated under the hood for INFRA than that). Andy That sounds good to me. Is this the sort of thing for which I could just file a ticket on INFRA and follow up with them? Step one is investigate - I've just seen it mentioned and don't know the details of what it does, aand what it does at ASF. Do you want to find out? Andy ...
Re: Release Jena 3.5.0?
I have no problem with this plan, but just to check: Andy, I believe you are thinking that the "obvious ones" are the ones that required no code changes and elicited no comments from you? Because if you want to include them, I can make a new PR quickly with just them (assuming I can fix whatever weird formatting thinking blew up that first PR). ajs6f Andy Seaborne wrote on 10/23/17 2:39 PM: All being well (usual caveats about "things" happening), I'll do the release in the next few days. I hope to include: JENA-1403: Tidy up regex pattern handling. #292 Bad regular expression patterns should throw ExprEvalException #291 Spell checking some Javadocs #290 The version bump PR#289 has some obvious ones to do and some things that need looking at. But the small deltas are obscured by an accidental reformat. As none are for bug fixes, I think that can wait. Close tracking of dependencies is usually more helpful that occasional jumps. If anything else low-risk turns up, I'll try to get it in - PRs work really well for this and there is no need to pause anything. I'll see direct commits to master but please only for trivial, zero-risk things. Andy On 23/10/17 03:52, Bruno P. Kinoshita wrote: Didn't have much time to contribute lately, so decided to spend some time during a Monday holiday here and spell check javadocs & site. Javadocs updates were done in https://github.com/apache/jena/pull/290 Site updates were done in r1812967. Didn't find anything to fix looking at the templates in Fuseki2 web app. Might have more time to review JIRA and see if there is any small issue for the web site. And I have a branch somewhere in one of my workstations with some tests, to increase test coverage a bit. Thanks for preparing 3.5.0!!! Bruno From: Andy Seaborne To: "dev@jena.apache.org" Sent: Tuesday, 17 October 2017 11:32 AM Subject: Release Jena 3.5.0? The tick is approaching. Are we ready to go? JIRA to be marked resolved? If so, I'll sort out a release soon. Andy Here's a list of changes of note that I gathered: Release changes Introducing TDB2: http://jena.staging.apache.org/documentation/tdb2/ *TDB2 is not compatible with TDB1* Compared to TDB1: * No size limits on transactions : bulk uploads into a live Fuseki can e 100's of millions of triples. * Models and Graphs can be passed across transactions * No queue of delayed updates, no transaction backlog problems. * "Writer pays" - readers don't All work for update is done on the writer thread. * Datatypes of numerics preserved; xsd:doubles supported. TDB2 is subject to change. We solicit any and all feedback (good and bad!) about TDB2 to help advance it to deployment-ready. JENA-1390 : Add StmtIterator.toModel : JENA-1392 : Add dynamic dataset support to SDB. JENA-1395 : "--output RDF/XML" now prints using the basic block-oriented writer, which uses less memory. Use "--formatted" (same as "--pretty") for pretty printed RDF/XML. JENA-1398 : Upgrade FOAF to add new spelling and deprecation of old for archaic FOAF properties == Dependency changes: No license changes. Upgrade jsonld-java to 0.11 jackson to 2.9.0 commons-fileuploader to 1.3.2->1.3.3 commons-io 2.5 in jena-base (was pulled in anyway by jsonld-java)
Re: At ISWC in Vienna
Hi, Jean-Marc, I am here in Vienna as well. I'm not sure if any other committers/PMC members are here, but I would be happy to meet about any Jena stuff you want to talk about. --- A. Soroka Research Computing : Office of the CIO : the Smithsonian Institution Jean-Marc Vanel wrote on 10/23/17 9:58 AM: Hi I'm at ISWC in Vienna until wednesday. If you also are there we could meet to chat. ( better answer privately )
[GitHub] jena pull request #289: Version bumps for 3.5
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/289#discussion_r146109452 --- Diff: jena-project/pom.xml --- @@ -1,867 +1,828 @@ - - -http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> - 4.0.0 - - org.apache.jena - jena-project - pom - http://jena.apache.org/ - 3.5.0-SNAPSHOT - Apache Jena - Project - - -org.apache -apache - -18 - - - - - - The Apache Software License, Version 2.0 - http://www.apache.org/licenses/LICENSE-2.0.txt - - - - -The Apache Software Foundation -http://www.apache.org/ - - - -1.7.25 -1.2.17 -4.12 -2.11.0 -0.9.3 - - -0.11.1 -2.9.0 - -2.5 -1.4 - -3.4 -1.4 -0.7 - -4.5.3 -4.4.6 - -${ver.httpcore} -${ver.httpclient} - -1.10 -6.4.1 - -5.2.2 - -2.7 - -0.6 - -1.9.5 -1.7.0 - -1.8 -${jdk.version} - -UTF-8 - -MM-dd'T'HH:mm:ssZ -0.1.5 - - - - - - doclint-java8-disable - -[1.8,) - - - - - -org.apache.maven.plugins -maven-javadoc-plugin - - -Xdoclint:none - - - - - - - - - - - -junit -junit -${ver.junit} -test - - - -xerces -xercesImpl -${ver.xerces} - - - -org.apache.httpcomponents -httpclient-cache -${ver.httpclient} - - - -commons-logging -commons-logging - - - - - -org.apache.httpcomponents -httpclient -${ver.httpclient} - - - -commons-logging -commons-logging - - - - - -commons-codec -commons-codec -${ver.commons-codec} - - - -commons-io -commons-io -${ver.commonsio} - - - -org.apache.thrift -libthrift -${ver.libthrift} - - - -org.apache.httpcomponents -httpcore - - -org.apache.commons -commons-lang3 - - - - - -org.apache.commons -commons-csv -${ver.commonscsv} - - - -org.apache.commons -commons-lang3 -${ver.commonslang3} - - - -commons-fileupload -commons-fileupload -1.3.3 - - - -org.apache.commons -commons-collections4 -4.1 - - - - -com.github.andrewoma.dexx -collection -${ver.dexxcollection} - - - -com.github.jsonld-java -jsonld-java -${ver.jsonldjava} - - -commons-logging -commons-logging - - - -org.apache.httpcomponents -httpclient-cache - - -org.apache.httpcomponents -httpclient - - -org.apache.httpcomponents -httpclient-osgi - - -org.apache.httpcomponents -httpcore-osgi - - -org.slf4j -slf4j-api - - - - - - -org.apache.lucene -lucene-core -${ver.lucene} -jar - - - -org.apache.lucene -lucene-analyzers-common -${ver.lucene} - - - -org.apache.lucene -lucene
[GitHub] jena issue #289: Version bumps for 3.5
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/289 Wait, something has gone wrong here. The only changes I made were to the values of a few properties in that `pom.xml`. I have no idea why it's doing such a giant diff. I need to figure that out. ---
[GitHub] jena issue #289: Version bumps for 3.5
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/289 Okay, let's let it hang until after the release then. And I'll file a Jira. ---
Re: Property Paths benchmark @ ISWC2017
I think Rob's suggested message is pretty reasonable. I think what we can do in this situation is to help open a larger conversation about what is fair and what is desirable for this kind of research. ajs6f Andy Seaborne wrote on 10/20/17 5:30 PM: On 20/10/17 11:13, Rob Vesse wrote: On 20/10/2017 15:56, "Andy Seaborne" wrote: Given this, references to the 2015 are spurious and misleading. If you read the original bachelors thesis that Marco referenced [1] the equivalent text and the footnote is as follows: 3 https://jena.apache.org/ retrieved at 13.12.2015 Which would indeed be Jena 3.0.1, so the original research was started in December 2015 and completed sometime between then and July 2016 when that thesis was submitted. I'm not disputing that at all - but the average reader will read the paper and that's what it claims. Clearly its wrong because we look harder; others may take it at face value. I would guess that when it was reformatted into a workshop paper they simply checked that all the URLs still worked and updated the footnotes accordingly Maybe we are just splitting hairs and expecting too much, it just frustrates me when someone discovers a problem and makes no effort to resolve it +1 Rob [1] https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf
[GitHub] jena pull request #289: Version bumps for 3.5
GitHub user ajs6f opened a pull request: https://github.com/apache/jena/pull/289 Version bumps for 3.5 There are 5 commits here, the first 4 of which are (I think) non-controversial. In the last, to get from Commons Lang 3.4 to 3.5 (and thence to 3.6) I had to change test code. I think the changes are kosher-- using `Z` instead of `+00:00` is legit according to the [XSD Dataypes spec](https://www.w3.org/TR/xmlschema11-2/#nt-tzFrag). But it might cause some surprises. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajs6f/jena VersionBumpsFor3.5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/289.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #289 commit 5f597bc598635a2553ca12a70a9d4e25a1b76246 Author: ajs6f Date: 2017-10-21T07:23:27Z Bump contract test machinery versions commit a0cf3320242dc1f456cd3e791760c16ff329d7a1 Author: ajs6f Date: 2017-10-21T07:32:38Z Bump log4j2 version commit 0a1a0caa637f14c7b37a15ed96971611af90a30b Author: ajs6f Date: 2017-10-21T07:41:23Z Commons lib version bumps commit 0f82c01e1b71c136c1e1a5677e1ec58a207f23d6 Author: ajs6f Date: 2017-10-21T07:50:01Z Bump Thrift version commit 9d05531d6690162f460390e5427f406cf1ac415c Author: ajs6f Date: 2017-10-21T09:52:58Z Bumping Commons Lang 3.4 -> 3.6 ---
[GitHub] jena issue #37: JENA-732 jena-maven-tools outputs to target/generated-source...
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/37 @stain Just pinging again-- is this PR still in flight, or does it still make sense? ---
Re: Java 9 branch? (Was: Release Jena 3.5.0?)
Who manages the ASF parent pom.xml? INFRA? Perhaps we can help move it forward? ajs6f Claude Warren wrote on 10/20/17 11:18 PM: Not really an immediate need so much as just wondering how close our code is to working under Java 9. I think it would also be nice to know when the various tools we use are Java 9 ready and perhaps lend them a hand if need be. More curiosity than anything else. Claude On Fri, Oct 20, 2017 at 3:47 PM, Andy Seaborne wrote: Claude - you can see branches that exist via the GH interface. And, no, theer isn't one. There is a jenkins job - it does not work, waiting in updates to roll through. Taking over version mgt of the plugins from the ASF parent seems to me like extra work for little benefit. Unless there is an immediate need? Andy On 19/10/17 08:19, Claude Warren wrote: Did we get a Java 9 branch started? Seems like most of the issues are around tooling not functionality of the product. If this is the case I would expect the differences between the java9 branch and the master to be contained in the pom.xml files. On Wed, Oct 18, 2017 at 2:20 AM, Andy Seaborne wrote: That would be good to see. Personally, I think that ways to use modules in term good practice and patterns, and also frameworks, in the java ecosystem will emerge but anything we can do to reduce barriers seems like a good thing. On Java9 generally: The build itself doesn't work with Java9 because it needs updated versions of some plugins, and those are inherited from the Apache parent POM. To take over the version control and override the std settings just seems like much work to get ahead by a short period of time. I'm assuming we stay on java8 as the requirement for applications for a while yet. Andy On 17 October 2017 at 11:20, Aaron Coburn wrote: Would it make sense to add an Automatic-Module-Name header to the manifest files so that Jena is easier to use in a JDK9 context? I could even volunteer to do this. Aaron On Oct 17, 2017, at 9:56 AM, aj...@apache.org wrote: Claude-- I see some updates available for the contract test machinery: org.xenei:contract-test-maven-plugin .. 0.1.5 -> 0.1.7 org.xenei:junit-contracts . 0.1.5 -> 0.1.7 Worth doing before a release? ajs6f Andy Seaborne wrote on 10/16/17 6:32 PM: The tick is approaching. Are we ready to go? JIRA to be marked resolved? If so, I'll sort out a release soon. Andy Here's a list of changes of note that I gathered: Release changes Introducing TDB2: http://jena.staging.apache.org/documentation/tdb2/ *TDB2 is not compatible with TDB1* Compared to TDB1: * No size limits on transactions : bulk uploads into a live Fuseki can e 100's of millions of triples. * Models and Graphs can be passed across transactions * No queue of delayed updates, no transaction backlog problems. * "Writer pays" - readers don't All work for update is done on the writer thread. * Datatypes of numerics preserved; xsd:doubles supported. TDB2 is subject to change. We solicit any and all feedback (good and bad!) about TDB2 to help advance it to deployment-ready. JENA-1390 : Add StmtIterator.toModel : JENA-1392 : Add dynamic dataset support to SDB. JENA-1395 : "--output RDF/XML" now prints using the basic block-oriented writer, which uses less memory. Use "--formatted" (same as "--pretty") for pretty printed RDF/XML. JENA-1398 : Upgrade FOAF to add new spelling and deprecation of old for archaic FOAF properties == Dependency changes: No license changes. Upgrade jsonld-java to 0.11 jackson to 2.9.0 commons-fileuploader to 1.3.2->1.3.3 commons-io 2.5 in jena-base (was pulled in anyway by jsonld-java)
Re: Property Paths benchmark @ ISWC2017
Perhaps the first line of work could be to contact the authors and ask them: Did you contact Jena (or for that matter, any of the other projects) for this work? Why did you use such an old version of Jena? Would you be willing to try again with a modern version? If the results are significantly different (as they almost certainly will be) would you be willing to make an emendation for your workshop paper? ajs6f Marco Neumann wrote on 10/19/17 12:10 PM: just on a side note since this is "only" a workshop contribution it will not make an appearance in the conference itself and will not appear in the main ISWC 2017 conference proceedings published by Springer but only as an independent publication of the workshop itself. responsibility for the workshop sits with the Organising Committee Axel-Cyrille Ngonga Ngomo, Institute for Applied Informatics, Leipzig, Germany Anastasia Krithara, National Center for Scienti c Research “Demokritos”, Athens, Greece Irini Fundulaki, ICS-FORTH, Heraklion, Crete, Greece and for review the Program Committee Milos Jovanovik, OpenLink Software, United Kingdom Pavlos Fafalios, University of Hannover. Germany Kostas Stefanidis, University of Tampere, Finland Muhammad Saleem, AKSW, University of Leipzig, Germany Manolis Terrovitis, IMIS, RC Athena, Greece Ricardo Usbeck, University of Leipzig, Germany George Papastefanatos, IMIS RC Athena, Greece Stasinos Kostantopoulos, NCSR Demokritos, Greece On Thu, Oct 19, 2017 at 3:51 PM, wrote: I hadn't intended to spend time at the benchmarking sessions at ISWC, but if it seems useful, I can try and raise this issue in person. I suppose partly it's a question of setting the record straight, and then partly it's a question of standing up for good practice, and then it's also a question of protecting Jena from unmerited negative consequences. I don't know how widely used such benchmarks are. Except for a few high-profile projects, I rarely see anyone refer to this sort of evidence as a reason to or not to adopt a system. ajs6f Marco Neumann wrote on 10/19/17 9:26 AM: Rob, unfortunately this is more common in Semantic Web research papers than one might expect. I have seen this before in particular with regards to perceived shortcomings of jena or its components. It might be a good idea to bring this to the attention of affiliated people in the organisation (here University of Southampton and Koblenz-Landau ). while I don't think this is an intentional attempt to bring Jena into disrepute the situation could be clarified and addressed by the ISWC workshop or track chair as well. I wish your mentioned "standard Industry and research practice" would be more common than it currently is. btw the thesis report is dated Juli 2016 On Thu, Oct 19, 2017 at 12:08 PM, Rob Vesse wrote: Marco I don’t believe anyone has tried to contact them yet I think that the complaints here are that there doesn’t appear to have been any attempt to report the issues identified back to the projects studied. If this was a security flaw in the project the standard Industry and research practice would be to make a responsible disclosure to the projects in advance of the public disclosure such that the researchers and projects can work together to resolve the problem. The implication being that it is irresponsible for the authors to benefit from pointing out flaws in the projects while appearing to make no efforts to help report/resolve those issues. As you suggest this paper does appear to be based upon some thesis work, that thesis indicates that the research was originally carried out in 2015 implying that the author knew of the issue two years ago. The project has a relatively small core of developers most of whom work on Jena on the side. We very much rely upon the wider community to provide input on bugs that need to be resolved e.g. Performance issues and the features we should prioritise. When someone clearly knew of a problem but didn’t tell us that is inevitably frustrating for the project. Rob On 19/10/2017 10:08, "Marco Neumann" wrote: did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab to get a response? the findings seem to based on work that has been published online as part of a bachelor’s thesis by Adrian Skubella. https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B. wrote: > For me this is really bad practice. It also looks like they did the > benchmark more than one year ago. Otherwise due to JENA-1195 this error > wouldn't occur anymore. And submission deadline was August 6th, 2017 . > Their experiments contain 8 queries, rerunning those shouldn't take ages... > > I'm currently trying to reproduce the results of
Re: Property Paths benchmark @ ISWC2017
I hadn't intended to spend time at the benchmarking sessions at ISWC, but if it seems useful, I can try and raise this issue in person. I suppose partly it's a question of setting the record straight, and then partly it's a question of standing up for good practice, and then it's also a question of protecting Jena from unmerited negative consequences. I don't know how widely used such benchmarks are. Except for a few high-profile projects, I rarely see anyone refer to this sort of evidence as a reason to or not to adopt a system. ajs6f Marco Neumann wrote on 10/19/17 9:26 AM: Rob, unfortunately this is more common in Semantic Web research papers than one might expect. I have seen this before in particular with regards to perceived shortcomings of jena or its components. It might be a good idea to bring this to the attention of affiliated people in the organisation (here University of Southampton and Koblenz-Landau ). while I don't think this is an intentional attempt to bring Jena into disrepute the situation could be clarified and addressed by the ISWC workshop or track chair as well. I wish your mentioned "standard Industry and research practice" would be more common than it currently is. btw the thesis report is dated Juli 2016 On Thu, Oct 19, 2017 at 12:08 PM, Rob Vesse wrote: Marco I don’t believe anyone has tried to contact them yet I think that the complaints here are that there doesn’t appear to have been any attempt to report the issues identified back to the projects studied. If this was a security flaw in the project the standard Industry and research practice would be to make a responsible disclosure to the projects in advance of the public disclosure such that the researchers and projects can work together to resolve the problem. The implication being that it is irresponsible for the authors to benefit from pointing out flaws in the projects while appearing to make no efforts to help report/resolve those issues. As you suggest this paper does appear to be based upon some thesis work, that thesis indicates that the research was originally carried out in 2015 implying that the author knew of the issue two years ago. The project has a relatively small core of developers most of whom work on Jena on the side. We very much rely upon the wider community to provide input on bugs that need to be resolved e.g. Performance issues and the features we should prioritise. When someone clearly knew of a problem but didn’t tell us that is inevitably frustrating for the project. Rob On 19/10/2017 10:08, "Marco Neumann" wrote: did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab to get a response? the findings seem to based on work that has been published online as part of a bachelor’s thesis by Adrian Skubella. https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B. wrote: > For me this is really bad practice. It also looks like they did the > benchmark more than one year ago. Otherwise due to JENA-1195 this error > wouldn't occur anymore. And submission deadline was August 6th, 2017 . > Their experiments contain 8 queries, rerunning those shouldn't take ages... > > I'm currently trying to reproduce the results of the paper, but the > whole experimental setup remains unclear. I'm wondering if they used > just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because > the runtimes in the eval section are quite small, but even loading the > data of their benchmark takes much more time. So maybe they used the > RDF4J server. > > The worst thing is that they didn't contact any of the developers. Or > did they talk to somebody here and then Andy created the ticket > JENA-1195? Also for the other queries that failed, I would expect to see > tickets on Apache JIRA or at least a hint on the Jena mailing list... > > @Andy I'm also wondering whether JENA-1317 addresses the problem with > the empty result of benchmark query containing an inverse property path. > > > On 18.10.2017 17:03, aj...@apache.org wrote: >> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole >> them and give them our POV? :grin: >> >> In all seriousness, from what I can tell the results amount to "Using >> older versions of our comparands and without contacting the projects >> in question we couldn't find a store that implements every property >> path feature correctly and some fail entirely." >> >> I'm not really sure how useful that information
Re: Property Paths benchmark @ ISWC2017
As you know, Andy, I'm going to ISWC this year-- shall I buttonhole them and give them our POV? :grin: In all seriousness, from what I can tell the results amount to "Using older versions of our comparands and without contacting the projects in question we couldn't find a store that implements every property path feature correctly and some fail entirely." I'm not really sure how useful that information is...? But I am ready to do a benchmarking paper for next year. Seems like it's a lot easier than I thought! ajs6f Andy Seaborne wrote on 10/17/17 9:28 AM: Hi Lorenz, Looks like JENA-1195 which is fixed. Does that look like it? I think it is shame when papers focus on bugs rather than discussing and even fixing them. Bugs aren't research. Path evaluation could improved to stream in more cases (that's why LIMIT didn't help), but 1195 explains the slowness and memory. Andy On 17/10/17 07:58, Lorenz B. wrote: Hi, I just walked through the papers for the upcoming ISWC conference and found a paper about benchmarking of SPARQL property paths [1] . Not sure if this is relevant, but it looks like Jena has some issues with different types of queries using the property path. For example, SELECT ?o WHERE {A B* ?o.} LIMIT 100 lead to an OOM error on non-cyclic data. Here is the relevant part of the paper: While benchmarking Virtuoso, RDF4J and Allegrograph no errors or exceptions have occurred. During the benchmark process of Jena an OutOfMemoryError has been thrown whenever a query with the * operator was used. In order to identify the cause of the error, the amount of results the query should return has been limited to 100. The results that have been returned by a query of the form SELECT ?o WHERE {A B* ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A. Due to this fact it is presumable that the query containing the * operator returns A recursively until the main memory was full. To ensure that this behaviour is not caused by cycles in the dataset a query of the same form but with a predicate IRI that did not exist in the dataset was executed. This query still returned 100 times A. This indicates, that the * operator is not implemented correctly. In addition, the experiments showed that: Due to the problems with the * operator the queries 4, 7 and 8 could not be processed. Additionally query 3, 5, and 6 returned no results after 1 hour and thus, were aborted. Query 1 returned an empty and thus, incomplete result set. Only for query 2 a valid result was returned. Due to the lack of comparable results, Jena has been omitted in the comparison of triple stores. In the discussion section, they summarize the overall performance of Jena by Jena could not return results for any query in under 1 hour besides query 2. Furthermore, the * operator could not be evaluated at all and the inverse operator returned empty result sets. It looks like they used version 3.0.1, so maybe this doesn't hold anymore for all of the queries. If not, it could be interesting to improve performance and/or completeness. I hope I didn't miss some open JIRA ticket, but in general I just wanted to highlight the presence of some published benchmark for those kind of queries. Cheers, Lorenz [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
Re: Release Jena 3.5.0?
Claude-- I see some updates available for the contract test machinery: org.xenei:contract-test-maven-plugin .. 0.1.5 -> 0.1.7 org.xenei:junit-contracts . 0.1.5 -> 0.1.7 Worth doing before a release? ajs6f Andy Seaborne wrote on 10/16/17 6:32 PM: The tick is approaching. Are we ready to go? JIRA to be marked resolved? If so, I'll sort out a release soon. Andy Here's a list of changes of note that I gathered: Release changes Introducing TDB2: http://jena.staging.apache.org/documentation/tdb2/ *TDB2 is not compatible with TDB1* Compared to TDB1: * No size limits on transactions : bulk uploads into a live Fuseki can e 100's of millions of triples. * Models and Graphs can be passed across transactions * No queue of delayed updates, no transaction backlog problems. * "Writer pays" - readers don't All work for update is done on the writer thread. * Datatypes of numerics preserved; xsd:doubles supported. TDB2 is subject to change. We solicit any and all feedback (good and bad!) about TDB2 to help advance it to deployment-ready. JENA-1390 : Add StmtIterator.toModel : JENA-1392 : Add dynamic dataset support to SDB. JENA-1395 : "--output RDF/XML" now prints using the basic block-oriented writer, which uses less memory. Use "--formatted" (same as "--pretty") for pretty printed RDF/XML. JENA-1398 : Upgrade FOAF to add new spelling and deprecation of old for archaic FOAF properties == Dependency changes: No license changes. Upgrade jsonld-java to 0.11 jackson to 2.9.0 commons-fileuploader to 1.3.2->1.3.3 commons-io 2.5 in jena-base (was pulled in anyway by jsonld-java)
github stuff Was: [2/2] jena git commit: JENA-1391: adding isEmpty method to Dataset
Andy Seaborne wrote on 10/13/17 3:40 PM: If anyone is interesting in following it up, I have read that Apache projects can now use gitbox where by all work is on Github, including the full PR cycle, and the ASF is mirrored back. To us, it looks like the GH is the master and ASF the mirror (IIRC its a bit more complicated under the hood for INFRA than that). Andy That sounds good to me. Is this the sort of thing for which I could just file a ticket on INFRA and follow up with them? As long as we are digressing, you know what I would really love? Being able to do our docs/site in git/github. I'm pretty sure other Apache projects manage to do that... ajs6f
Re: [2/2] jena git commit: JENA-1391: adding isEmpty method to Dataset
I did exactly that -- rebase branch over master, merge branch into master, and push to apache:master (which is what I usually do). I see them being different than the commits in the PR, but I can't see for the life of me why... Anyway, I force-pushed to the PR-- that seems to have closed it. ajs6f Andy Seaborne wrote on 10/13/17 11:54 AM: Adam, I guess you pushed from your local repo to Jena Aapche git repo? Maybe after a rebase? These aren't the commits on the PR. Could you pull from GH? Or otherwise tidy up the PR? (you can force push changes from your local repo to GH) Thanks Andy On 13/10/17 15:40, aj...@apache.org wrote: JENA-1391: adding isEmpty method to Dataset Project: http://git-wip-us.apache.org/repos/asf/jena/repo Commit: http://git-wip-us.apache.org/repos/asf/jena/commit/b792e8da Tree: http://git-wip-us.apache.org/repos/asf/jena/tree/b792e8da Diff: http://git-wip-us.apache.org/repos/asf/jena/diff/b792e8da Branch: refs/heads/master Commit: b792e8da1fbe7e397399f2b0803f4e28222c9c3e Parents: 32de4dc Author: ajs6f Authored: Thu Oct 12 10:18:41 2017 -0400 Committer: ajs6f Committed: Fri Oct 13 10:40:18 2017 -0400 -- jena-arq/src/main/java/org/apache/jena/query/Dataset.java| 7 +++ .../main/java/org/apache/jena/sparql/core/DatasetImpl.java | 5 + .../org/apache/jena/sparql/core/AbstractTestDataset.java | 8 3 files changed, 20 insertions(+) -- http://git-wip-us.apache.org/repos/asf/jena/blob/b792e8da/jena-arq/src/main/java/org/apache/jena/query/Dataset.java -- diff --git a/jena-arq/src/main/java/org/apache/jena/query/Dataset.java b/jena-arq/src/main/java/org/apache/jena/query/Dataset.java index db88642..539053a 100644 --- a/jena-arq/src/main/java/org/apache/jena/query/Dataset.java +++ b/jena-arq/src/main/java/org/apache/jena/query/Dataset.java @@ -113,4 +113,11 @@ public interface Dataset extends Transactional * The dataset can not be used for query after this call. */ public void close() ; + +/** + * @return Whether this {@code Dataset} is empty of graphs. Be aware of the semantic looseness inherent in + * https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#h_note_4";>the definition + * of RDF Datasets; whether a named graph exists if nothing is in it is implementation-specific. + */ +boolean isEmpty(); } http://git-wip-us.apache.org/repos/asf/jena/blob/b792e8da/jena-arq/src/main/java/org/apache/jena/sparql/core/DatasetImpl.java -- diff --git a/jena-arq/src/main/java/org/apache/jena/sparql/core/DatasetImpl.java b/jena-arq/src/main/java/org/apache/jena/sparql/core/DatasetImpl.java index 2216d2f..00e419a 100644 --- a/jena-arq/src/main/java/org/apache/jena/sparql/core/DatasetImpl.java +++ b/jena-arq/src/main/java/org/apache/jena/sparql/core/DatasetImpl.java @@ -209,4 +209,9 @@ public class DatasetImpl implements Dataset if ( uri == null ) throw new ARQException("null for graph name"); } + +@Override +public boolean isEmpty() { +return dsg.isEmpty(); +} } http://git-wip-us.apache.org/repos/asf/jena/blob/b792e8da/jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java -- diff --git a/jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java b/jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java index 0ac1dee..b55991d 100644 --- a/jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java +++ b/jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java @@ -108,4 +108,12 @@ public abstract class AbstractTestDataset extends BaseTest assertFalse(model1.isIsomorphicWith(ds.getNamedModel(graphName))) ; assertTrue(model2.isIsomorphicWith(ds.getNamedModel(graphName))) ; } + +@Test public void dataset_06() +{ +String graphName = "http://example/"; ; +Dataset ds = createDataset() ; +ds.addNamedModel(graphName, model1) ; +assertFalse("Dataset should not be empty after a named graph has been added!", ds.isEmpty()); +} }
[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset
Github user ajs6f closed the pull request at: https://github.com/apache/jena/pull/287 ---
[GitHub] jena pull request #288: JENA-1401 (fuseki backup) Don't use Jetty code in wa...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/288#discussion_r144571831 --- Diff: jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/async/AsyncPool.java --- @@ -51,7 +51,13 @@ public AsyncTask submit(Runnable task, String displayName, DataService dataServi synchronized(mutex) { String taskId = Long.toString(++counter) ; Fuseki.serverLog.info(format("Task : %s : %s",taskId, displayName)) ; -Callable c = Executors.callable(task) ; +Callable c = ()->{ +try { task.run(); } +catch (Throwable th) { +Fuseki.serverLog.warn(format("Exception in task %s execution", taskId), th); --- End diff -- Does this qualify as an error? (Logging-wise) ---
[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/287#discussion_r144359574 --- Diff: jena-arq/src/main/java/org/apache/jena/query/Dataset.java --- @@ -113,4 +113,9 @@ * The dataset can not be used for query after this call. */ public void close() ; + +/** + * @return Whether this {@code Dataset} is empty of triples, whether in the default graph or in any named graph. --- End diff -- @afs Better? ---
Re: Fuseki service extensibility
I'm not in a big hurry to work on this, but LDP access to a dataset might find that useful. ajs6f Andy Seaborne wrote on 10/12/17 11:54 AM: JENA-1400 is a small step to providing some degree of flexibility in Fuseki for adding custom services to a dataset. The JIRA is needed because currently the OperationName set is sealed. I'm not seeing this as a common thing to do. Many things are better done (e.g. data conversion) output and streamed to Fuseki. The one I have mind is implementing a patch service (and using HTTP PATCH, as well as POST) based on RDF Patch [1]. Changes to datasets can be calculated elsewhere and the Fuseki dataset changed. (It's quite hard to automatically generate SPARQL Update for arbitrary changes if there are blank nodes involved.) Any other use cases? Andy [1] https://afs.github.io/rdf-delta/
[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/287#discussion_r144321650 --- Diff: jena-arq/src/main/java/org/apache/jena/query/Dataset.java --- @@ -113,4 +113,11 @@ * The dataset can not be used for query after this call. */ public void close() ; + +/** + * @return Whether this {@code Dataset} is empty of graphs. Be aware of the semantic looseness inherent in + * https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#h_note_4";>the definition + * of RDF Datasets; whether a named graph exists if nothing is in it is implementation-specific. + */ +boolean isEmpty(); --- End diff -- I didn't do that at first because it felt like a bit of a conflict against the rest of the API for `Dataset`, which discusses models/graphs and not tuples. But if you're okay with it, it doesn't bother me. ---
[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/287#discussion_r144314830 --- Diff: jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java --- @@ -108,4 +108,12 @@ assertFalse(model1.isIsomorphicWith(ds.getNamedModel(graphName))) ; assertTrue(model2.isIsomorphicWith(ds.getNamedModel(graphName))) ; } + +@Test public void dataset_06() +{ +String graphName = "http://example/"; ; +Dataset ds = createDataset() ; +ds.addNamedModel(graphName, model1) ; +assertFalse("Dataset should not be empty after a named graph has been added!", ds.isEmpty()); +} --- End diff -- See above-- so do we need a different impl of `isEmpty` for every kind of `DatasetGraph`? ---
[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/287#discussion_r144314175 --- Diff: jena-arq/src/main/java/org/apache/jena/query/Dataset.java --- @@ -113,4 +113,11 @@ * The dataset can not be used for query after this call. */ public void close() ; + +/** + * @return Whether this {@code Dataset} is empty of graphs. Be aware of the semantic looseness inherent in + * https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#h_note_4";>the definition + * of RDF Datasets; whether a named graph exists if nothing is in it is implementation-specific. + */ +boolean isEmpty(); --- End diff -- Oh, fudge. Then we really can't have a default impl of this, can we? ---
[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset
GitHub user ajs6f opened a pull request: https://github.com/apache/jena/pull/287 JENA-1391: adding isEmpty method to Dataset One of the asks from JENA-1391. Not by any means the whole ticket. :) You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajs6f/jena JENA-1391isEmpty Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/287.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #287 commit 58653f788f76d8ebb76b4d97a1f75d2f1027824a Author: ajs6f Date: 2017-10-12T14:18:41Z JENA-1391: adding isEmpty method to Dataset ---
Re: Obfuscation Support?
I think that having the tooling available would be nothing but good. (Well, except for the hard work that Rob will have to do to make it happen. :g:) And I agree with Andy that we want to be careful about how we present it-- managing expectations is key. Perhaps we can make a point of providing the tooling in a way that moves users through some thinking about MCVE provision and so forth? I'm just imagining a page on the site where you get the tool, with that link wrapped in some useful guidance explaining the limitations that Andy discussed, how to be sure you are asking your question in a way that will get the best answers, etc. Do we perhaps need to consider how we could make clear that there is an ability to purchase support from external vendors? Would it be possible to have a page on the website that provides a list of known support vendors, obviously with the appropriate disclaimers around nonendorsement, neutrality etc and the ability for anyone who asks to have their Company listed? +1! I bet we can do this, well within Apache boundaries. For example, there are plenty of pages like: https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support ajs6f Rob Vesse wrote on 10/12/17 9:21 AM: My intention was not for us to start offering a debugging service nor to stop expecting users to provide a minimal complete example. My thinking is that it provides a way to help users in providing a complete example, I was not expecting that they would use it to submit their entire data sets. And clearly obfuscation does have limits, particularly when you consider things like typed literals where are you almost need to leave them alone in order for the obfuscated outputs to have any semblance of meaning and usefulness. I totally agree that none of us has the time to dive into detailed debugging of users problems. Do we perhaps need to consider how we could make clear that there is an ability to purchase support from external vendors? Would it be possible to have a page on the website that provides a list of known support vendors, obviously with the appropriate disclaimers around nonendorsement, neutrality etc and the ability for anyone who asks to have their Company listed? Rob On 12/10/2017 12:36, "Andy Seaborne" wrote: Good question. It might be valuable to add to the collection of tools. I do have some concern about we are offering here though. (1) if we offer to look at large datasets and/or large log files, then work is moving from the user to the list. (2) the obfuscated data is public. We don't want any commitment/liability here that the code is, say, suitable for personal data because sometimes obfuscation is not enough. On the first point: Part of a CMVE [1] is the user doing some work. If we make it acceptable to bypass that, the work still exists but it has been transferred. I simply can't spend 1+ hour setting up a test environment. Performance can involve load as well and I don't have the infrastructure to look at that. I'm more willing to spend time if the user is in a university/non-profit or for people, commercial or otherwise, who engage in useful discussion. A good report is a contribution. But I'm not willing (or even able) to subsidise commercial organisations per se. They can go find and pay for commercial support contract or contract with someone (a contributor/committer maybe) and have a confidentiality agreement. It is not always one question in isolation. Solve one issue and then another arrives. Sorry if this is grumpy but I can see ways things might turn out not so well without us also having common agreement about how we operate on users@. Andy [1] and point to https://stackoverflow.com/help/mcve PS There is also a theme of "ask first" before trying anything, or doing in a few minutes investigation. Such emails are vague. On 12/10/17 10:03, Rob Vesse wrote: > Folks > > > > An occasional recurring theme I see on the users list is we get a vague question about performance details where users can’t/won’t share Data and queries because of confidentiality or other concerns. This is something we’ve encountered in the past with customers for our commercial products and so internally we developed some obfuscation code using Jena APIs so that we can obfuscate queries and dates in our logs allowing customers to share these without confidentiality being breached. > > > > Would it be valuable to the project if we cleaned this up and made it a part of core Jena libraries? > > > > It would probably take a bit of time to unpick this from our code and to generalise it but I think it could be a very useful feature going forward. Let me know what you think > > > > Rob > >
Re: TDB2 merged
Okay, that makes sense. We might even just swap the "namespaces" at some future point when TDB2 becomes the default, i.e. go to tdbquery being for TDB2 and there being a tdb1.tdbquery, as a stop on the road to deprecation. ajs6f Andy Seaborne wrote on 10/7/17 9:42 AM: On 06/10/17 21:17, aj...@apache.org wrote: The commands are in the binary distribution "apache-jena" download but there are no script wrappers (easy to copy and fix though). Just a thought-- maybe better to add flags to the current scripts? Having all-new loader scripts for TDB2 would make for three different bulk loader scripts... Maybe though it's not so simple a thing to do as the scripts are a general wrapper template to call the java code. For now, the TDB2 commands are of the form "tdb2.tdb*" tdb2.tdbquery ... Sometime, detecting the database type would be great but not critical path for the 3.5.0. Andy ajs6f Andy Seaborne wrote on 10/6/17 7:36 AM: That would be very helpful. "documentation" is a task in the next few days. It's the block on sending any messages to users@ etc about it. The raw material is in git: https://github.com/apache/jena/blob/master/jena-db/use-fuseki-tdb2.md https://github.com/apache/jena/blob/master/jena-db/use-tdb2-cmds.md The commands are in the binary distribution "apache-jena" download but there are no script wrappers (easy to copy and fix though). Either run from development or java -cp 'DIR/lib/*' tdb2.tdbloader ... args ... some of my data files are too big to be loaded via the Graph Store API. From TDB2 and Fuseki's point of view, that's no longer true. You can (should be able to) load any amount. The fuseki-basic server also has TDB2 in it so if you are doing everything script-driven, you can run that "--conf config-tdb2.ttl" There is no progress indicator in the server log so you may wish to set set some kind of verbose option in the sender. Andy Uploading large files: The UI does this all quite well. What's the magic for a command line/scripted process? It needs a tool that does not buffer or inspect the file or otherwise try to be helpful. Anyone know of good tools for this? I haven't managed to work out which set of "curl" arguments do this without buffering the file (--data* seem to buffer the file; -F is a form upload, not pure POST). This seems to work: wget --post-file=/home/afs/Datasets/BSBM/bsbm-200m.nt --header 'Content-type: application/n-triples' http://localhost:3030/data 200M BSBM (49Gbytes) loaded at 42K triples/s. The content length in the fuskei log is reported wrongly (1002691465 ... int/long error) but the triple count is right. It does ruins the interactive performance of the machine! s-post crashes immediately if given a large files - don't know why. On 06/10/17 07:50, Osma Suominen wrote: Excellent! I have a couple of Fuseki installations where I could test drive this. I'd just need to know how to do the configuration, and also a tool like tdbloader for offline loading since some of my data files are too big to be loaded via the Graph Store API. No hurry though. -Osma Andy Seaborne kirjoitti 04.10.2017 klo 00:43: It's in the build joined in at apache-jena-libs. It is in Fuseki2 server jar, but not the UI - a user needs to use a configuration file. That also works in fuseki-basic. Documentation to follow. Andy
Re: TDB2 merged
The commands are in the binary distribution "apache-jena" download but there are no script wrappers (easy to copy and fix though). Just a thought-- maybe better to add flags to the current scripts? Having all-new loader scripts for TDB2 would make for three different bulk loader scripts... ajs6f Andy Seaborne wrote on 10/6/17 7:36 AM: That would be very helpful. "documentation" is a task in the next few days. It's the block on sending any messages to users@ etc about it. The raw material is in git: https://github.com/apache/jena/blob/master/jena-db/use-fuseki-tdb2.md https://github.com/apache/jena/blob/master/jena-db/use-tdb2-cmds.md The commands are in the binary distribution "apache-jena" download but there are no script wrappers (easy to copy and fix though). Either run from development or java -cp 'DIR/lib/*' tdb2.tdbloader ... args ... some of my data files are too big to be loaded via the Graph Store API. From TDB2 and Fuseki's point of view, that's no longer true. You can (should be able to) load any amount. The fuseki-basic server also has TDB2 in it so if you are doing everything script-driven, you can run that "--conf config-tdb2.ttl" There is no progress indicator in the server log so you may wish to set set some kind of verbose option in the sender. Andy Uploading large files: The UI does this all quite well. What's the magic for a command line/scripted process? It needs a tool that does not buffer or inspect the file or otherwise try to be helpful. Anyone know of good tools for this? I haven't managed to work out which set of "curl" arguments do this without buffering the file (--data* seem to buffer the file; -F is a form upload, not pure POST). This seems to work: wget --post-file=/home/afs/Datasets/BSBM/bsbm-200m.nt --header 'Content-type: application/n-triples' http://localhost:3030/data 200M BSBM (49Gbytes) loaded at 42K triples/s. The content length in the fuskei log is reported wrongly (1002691465 ... int/long error) but the triple count is right. It does ruins the interactive performance of the machine! s-post crashes immediately if given a large files - don't know why. On 06/10/17 07:50, Osma Suominen wrote: Excellent! I have a couple of Fuseki installations where I could test drive this. I'd just need to know how to do the configuration, and also a tool like tdbloader for offline loading since some of my data files are too big to be loaded via the Graph Store API. No hurry though. -Osma Andy Seaborne kirjoitti 04.10.2017 klo 00:43: It's in the build joined in at apache-jena-libs. It is in Fuseki2 server jar, but not the UI - a user needs to use a configuration file. That also works in fuseki-basic. Documentation to follow. Andy
Re: [DRAFT] Jena report - October 2017
+1 ajs6f Andy Seaborne wrote on 10/5/17 8:48 AM: ## Description: Jena is a framework for developing Semantic Web and Linked Data applications in Java. It provides implementation of W3C standards for RDF and SPARQL. ## Issues: There are no issues requiring board attention at this time. ## Activity: Jena released version 3.4.0 2017-07-17 The project has received a software contribution of a new storage subsystem. Software grants from the main developer (who is also a committer but this work was not done at Apache) and his employer, for most of the development period, have been obtained. This work was originally funded by a UK government R&D grant, with the condition the work was open source. ## Health report: The activity levels look normal. ## PMC changes: - Currently 12 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Adam Soroka on Mon Jun 06 2016 ## Committer base changes: - Currently 15 committers. - No new committers added in the last 3 months - Last committer addition was Lorenz Buehmann at Fri Oct 28 2016 ## Releases: - Last release was 3.4.0 on 2017-07-17 ## JIRA activity: - 29 JIRA tickets created in the last 3 months - 30 JIRA tickets closed/resolved in the last 3 months
[GitHub] jena pull request #282: JENA-1393: Format prefix names
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/282#discussion_r142396325 --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/util/FmtUtils.java --- @@ -535,6 +535,7 @@ private static boolean validPNameChar(char ch) { if ( Character.isLetterOrDigit(ch) ) return true ; if ( ch == '.' )return true ; +if ( ch == ':' )return true ; --- End diff -- Oh, missed that. Yeah, I was reading some Scala the other day and it made me sad to come back to ADT-less Java. ð ---
[GitHub] jena pull request #282: JENA-1393: Format prefix names
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/282#discussion_r142186147 --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/util/FmtUtils.java --- @@ -535,6 +535,7 @@ private static boolean validPNameChar(char ch) { if ( Character.isLetterOrDigit(ch) ) return true ; if ( ch == '.' )return true ; +if ( ch == ':' )return true ; --- End diff -- This is getting long enough that it might read better as a `switch`/`case`. ---
[GitHub] jena pull request #281: Slice out old Codehaus JXR Maven plugin invocation
GitHub user ajs6f opened a pull request: https://github.com/apache/jena/pull/281 Slice out old Codehaus JXR Maven plugin invocation You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajs6f/jena FixJXRPlugin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/281.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #281 commit 4f925daf6b5dc31e6bd2faefdac9885fb9d3940f Author: ajs6f Date: 2017-10-01T18:04:41Z Slice out old Codehaus JXR Maven plugin invocation ---
Re: Codehaus JXR missing?
Sure, as long as it doesn't seem that there is any actual reason for it. (and it doesn't) ajs6f Andy Seaborne wrote on 10/1/17 1:43 PM: especially as the Apache one is setup in jena-project/pom.xml. Do you want to go and fix this? Andy On 01/10/17 15:35, aj...@apache.org wrote: I just made a minor PR (content doesn't really matter) and the Travis CI build is repeatedly showing: ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.10:resolve-plugins (resolve-plugins) on project jena-maven-tools: Nested:: Could not find artifact org.codehaus.mojo:jxr-maven-plugin:jar:1.5 in central (https://repo.maven.apache.org/maven2) [ERROR] which seems like of weird, both in that a plugin would be missing and that we seem to be using a Codehaus version of JXR-- anyone know if there is a particular reason we don't use the Apache version: http://maven.apache.org/jxr/maven-jxr-plugin/ ?
Codehaus JXR missing?
I just made a minor PR (content doesn't really matter) and the Travis CI build is repeatedly showing: ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.10:resolve-plugins (resolve-plugins) on project jena-maven-tools: Nested:: Could not find artifact org.codehaus.mojo:jxr-maven-plugin:jar:1.5 in central (https://repo.maven.apache.org/maven2) [ERROR] which seems like of weird, both in that a plugin would be missing and that we seem to be using a Codehaus version of JXR-- anyone know if there is a particular reason we don't use the Apache version: http://maven.apache.org/jxr/maven-jxr-plugin/ ? -- ajs6f
[GitHub] jena pull request #280: Deprecate Jena's Callback in favor of Java API's ...
GitHub user ajs6f opened a pull request: https://github.com/apache/jena/pull/280 Deprecate Jena's Callback in favor of Java API's Consumer Seems like we could deprecate `Callback` in the next release and remove in the following, unless I am missing something about the contract for callbacks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajs6f/jena CallbackToConsumer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/280.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #280 commit 26592ee47d2203cfa8165d350c4f07908e760ee0 Author: ajs6f Date: 2017-10-01T14:00:06Z Deprecate Jena's Callback in favor of Java API's Consumer ---
Re: [VOTE] Accept contribution of TDB2
+1 ajs6f Osma Suominen wrote on 9/22/17 10:25 AM: Andy Seaborne kirjoitti 22.09.2017 klo 16:55: This VOTE is to accept a contribution of software for TDB2 comprising of the contents of the GitHub repository: https://github.com/afs/mantis as of commit 71a70fd76ebc35cda26258bad0459e97f9860b04 (2017-09-22) subject to software grants from Epimorphics Ltd and Andy Seaborne, which cover the entire contribution. Please vote to approve receiving this contribution: [ ] +1 Accept the contribution [ ] -1 Don't accept the contribution because ... +1 -Osma
Re: eclipse and shaded guava?
This is a long-standing annoyance caused by our need to shade a modern version of Guava into the code to avoid conflicting with the very old version in Hadoop. Do you have the jena0-shaded-guava project open in Eclipse? The problem usually goes away if it is closed. ajs6f Chris Tomlinson wrote on 9/11/17 1:47 PM: Hi, I’m having a bit of a hassle getting eclipse Mars 4.5.2 to hook up properly with imports like: import org.apache.jena.ext.com.google.common.cache.CacheBuilder ; import org.apache.jena.ext.com.google.common.cache.CacheStats ; I "git clone" jena and mvn clean install mvn eclipse:eclipse and then import the various submodules as existing maven projects into eclipse. Once the imports complete there are a few of the submodules with syntax errors in eclipse centered on the shaded guava. The projects with errors all have jena-shaded-guava as a project dependency in the .project and also a library reference to M2_REPO/com/google/guava/guava/21.0/guava-21.0.jar in the .classpath. The jena repo and submodules build and test fine from the command line. I’ve run maven update project on all of the jena projects and once the “update project” process completes the errors are cleared (a result of “clean projects” being checked) from all of the projects and then during the “building workspace” process the errors reappear one-by-one as the workspace is rebuilt. I appreciate any ideas about what I’m stumbling on. Thanks, Chris
Re: Jena over Cassandra?
No, I had not seen that, thanks! Looks very interesting! ajs6f Phil Coates wrote on 9/5/17 11:04 AM: Have you looked at CM-Well (https://github.com/thomsonreuters/CM-Well)? This is based on Cassandra and ElasticSearch. *Philip Coates* philip.coa...@semanticintegration.co.uk <mailto:philip.coa...@semanticintegration.co.uk> philip.coa...@sparqlr.com <mailto:philip.coa...@sparqlr.com> skype:philip.coates.76 Tel: +44 (0)7711 818384 *SemanticIntegration* <http://www.semanticintegration.co.uk/> On 5 September 2017 at 15:40, mailto:aj...@apache.org>> wrote: The requirements for distributed storage are actually that DRAS-TIC (see that grant description) be used, and DRAS-TIC is 100% based around Cassandra, so effectively, the requirement is that Cassandra be used, at least at core. So part of what I am wondering (if it's not obvious) is "If we're going to have a Cassandra cluster as part of this, how can we get as much mileage as possible out of it?" I know that Cassandra offers some ordering capabilities out-of-the-box, although I'm not familiar with them. Maybe they could be used to support merge join generally. CumulusRDF (as shown in that paper I forwarded) uses a structure in which they mostly leave column values empty. The information is stored entirely in the keys, and use is made of prefix lookup. Does your system do something like that, Claude? It sounds like you are storing tuple component in the column values. ajs6f Andy Seaborne wrote on 9/5/17 4:43 AM: On Mon, Sep 4, 2017 at 12:10 PM, mailto:aj...@apache.org>> wrote: Little of both? :grin: Primarily I am interested because of a grant [1] in which the Smithsonian Institution (where I work) is participating in a supporting role (partly because I convinced us to). That work involves using Cassandra for distributed storage, and it will also involve a distributed LDP implementation (the Fedora API referred to in that grant description is really just a packaging of Memento [2] with LDP [3]), hence my interest in jena-on-cassandra. Turning this round - what are the requirements for the distributed storage? As I understand the join question, the usual move with Cassandra is to denormalize and store the joined data together, but that's obviously nontrivial in our situation, where we don't know the potential queries. Have you looked at an indexing solution such as was used by CumulusRDF [4]? (single graph example) If Cassandra has stored PSO and POS then parallel merge joins are possible. Andy ajs6f [1] https://www.imls.gov/grants/awarded/lg-71-17-0159-17 <https://www.imls.gov/grants/awarded/lg-71-17-0159-17> [2] http://www.mementoweb.org/guide/quick-intro/ <http://www.mementoweb.org/guide/quick-intro/> [3] https://www.w3.org/TR/ldp/ [4] http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Worksh <http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Worksh> ops/SSWS/Ladwig-et-all-SSWS2011.pdf Claude Warren wrote on 9/2/17 12:44 PM: are you looking to use jena-on-cassandra or do you have ideas? what leads you to ask about it? On Sat, Sep 2, 2017 at 1:21 PM, mailto:aj...@apache.org>> wrote: Hey, Claude-- Just curious as to where https://github.com/Claudenw/jena-on-cassandra <https://github.com/Claudenw/jena-on-cassandra> has ended up. Is that still work-in-progress? -- ajs6f -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren <http://www.linkedin.com/in/claudewarren>
Re: Jena over Cassandra?
The requirements for distributed storage are actually that DRAS-TIC (see that grant description) be used, and DRAS-TIC is 100% based around Cassandra, so effectively, the requirement is that Cassandra be used, at least at core. So part of what I am wondering (if it's not obvious) is "If we're going to have a Cassandra cluster as part of this, how can we get as much mileage as possible out of it?" I know that Cassandra offers some ordering capabilities out-of-the-box, although I'm not familiar with them. Maybe they could be used to support merge join generally. CumulusRDF (as shown in that paper I forwarded) uses a structure in which they mostly leave column values empty. The information is stored entirely in the keys, and use is made of prefix lookup. Does your system do something like that, Claude? It sounds like you are storing tuple component in the column values. ajs6f Andy Seaborne wrote on 9/5/17 4:43 AM: On Mon, Sep 4, 2017 at 12:10 PM, wrote: Little of both? :grin: Primarily I am interested because of a grant [1] in which the Smithsonian Institution (where I work) is participating in a supporting role (partly because I convinced us to). That work involves using Cassandra for distributed storage, and it will also involve a distributed LDP implementation (the Fedora API referred to in that grant description is really just a packaging of Memento [2] with LDP [3]), hence my interest in jena-on-cassandra. Turning this round - what are the requirements for the distributed storage? As I understand the join question, the usual move with Cassandra is to denormalize and store the joined data together, but that's obviously nontrivial in our situation, where we don't know the potential queries. Have you looked at an indexing solution such as was used by CumulusRDF [4]? (single graph example) If Cassandra has stored PSO and POS then parallel merge joins are possible. Andy ajs6f [1] https://www.imls.gov/grants/awarded/lg-71-17-0159-17 [2] http://www.mementoweb.org/guide/quick-intro/ [3] https://www.w3.org/TR/ldp/ [4] http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Worksh ops/SSWS/Ladwig-et-all-SSWS2011.pdf Claude Warren wrote on 9/2/17 12:44 PM: are you looking to use jena-on-cassandra or do you have ideas? what leads you to ask about it? On Sat, Sep 2, 2017 at 1:21 PM, wrote: Hey, Claude-- Just curious as to where https://github.com/Claudenw/jena-on-cassandra has ended up. Is that still work-in-progress? -- ajs6f -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
Re: Jena over Cassandra?
Little of both? :grin: Primarily I am interested because of a grant [1] in which the Smithsonian Institution (where I work) is participating in a supporting role (partly because I convinced us to). That work involves using Cassandra for distributed storage, and it will also involve a distributed LDP implementation (the Fedora API referred to in that grant description is really just a packaging of Memento [2] with LDP [3]), hence my interest in jena-on-cassandra. As I understand the join question, the usual move with Cassandra is to denormalize and store the joined data together, but that's obviously nontrivial in our situation, where we don't know the potential queries. Have you looked at an indexing solution such as was used by CumulusRDF [4]? ajs6f [1] https://www.imls.gov/grants/awarded/lg-71-17-0159-17 [2] http://www.mementoweb.org/guide/quick-intro/ [3] https://www.w3.org/TR/ldp/ [4] http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf Claude Warren wrote on 9/2/17 12:44 PM: are you looking to use jena-on-cassandra or do you have ideas? what leads you to ask about it? On Sat, Sep 2, 2017 at 1:21 PM, wrote: Hey, Claude-- Just curious as to where https://github.com/Claudenw/jena-on-cassandra has ended up. Is that still work-in-progress? -- ajs6f
Jena over Cassandra?
Hey, Claude-- Just curious as to where https://github.com/Claudenw/jena-on-cassandra has ended up. Is that still work-in-progress? -- ajs6f
[GitHub] jena issue #233: Added mosaic and thrift packages to org.apache.jena.sparql....
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/233 @afs Okay, now I see what you mean. Yeah, insofar as this gear is trying to "federate" `DatasetGraph`s, it doesn't make sense to penetrate that abstraction to reach `TriTable` and `HexTable`, which are really just implementation constructs for TIM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena issue #233: Added mosaic and thrift packages to org.apache.jena.sparql....
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/233 @afs No, I haven't fooled with it at all because I didn't want to spend that time until @dick-twocows confirmed that it was ready for other eyes. Re: `StreamRDFTriHexTable` I didn't see that in `afs/jena:master` or in `afs/mantis:master`-- where is it? I'm certainly +1 to @afs's comments about it being better to have some new modules than more code in the core, although distributed operation is very important in the future, I think, and I could imagine this stuff migrating into the core at some point. @afs is asking for some clarity on how this stuff is laid out-- one way might be for @dick-twocows to add package comments with a solid description in each of what that package does. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena issue #274: JENA-1381: Use all information in the cache key (text queri...
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/274 I'm not that worried about this case. (Although I would actually have fewer special graph names and more types, but that's just my taste; I'm not arguing that we should change that now.) It was more your first remark about "Else, we'd end up with `Optional` all over the place" and the fact that I don't feel like we have a clear way to make any changes at all to the core SPI and API. This (PR) isn't really the right place for the larger discussion-- I'll take it to dev@. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena issue #233: Added mosaic and thrift packages to org.apache.jena.sparql....
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/233 Hey, @dick-twocows and @afs, just picking up this conversation. Thanks for the work so far, @dick-twocows! Do you feel like this is in a state ready for in-depth review, or are you still working with it? @afs, does @dick-twocows's comment above gives a good sense of the contribution, or were you looking for something more in-depth? I think it makes a good outline and there's not much point to filling in a lot of detail until we are sure the contribution is close to finished. I think it would be great to get this into the next release and I would be happy to a) work with @dick-twocows to help make that happen and b) cut that release. (Although as I never tire of complaining, it would also be great for another committer to do that :) ). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena issue #274: JENA-1381: Use all information in the cache key (text queri...
Github user ajs6f commented on the issue: https://github.com/apache/jena/pull/274 If we ever want to use `Optional` at all (and I would, I think it is clear and avoids special names in many cases), we have to start somewhere (or we have to make a massive sudden change to the Graph SPI, maybe 4.0?). I don't want to make a fuss, I would just like to be able to gradually introduce it. Maybe not on this PR, and maybe gradual isn't better than a big change at 4.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request #275: JENA-1383: Improve handling of bad character encodin...
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/275#discussion_r134117416 --- Diff: jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/servlets/ActionSPARQL.java --- @@ -205,7 +206,11 @@ public static void parse(HttpAction action, StreamRDF dest, InputStream input, L .lang(lang) .base(base) .parse(dest); -} +} catch (RuntimeIOException ex) { +if ( ex.getCause() instanceof CharacterCodingException ) +throw new RiotException("Character Coding Error: "+ex.getMessage()); --- End diff -- maybe `throw new RiotException("Character Coding Error: "+ex.getMessage(), ex.getCause());` to keep more context? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request #273: JENA-1372: fn:apply and fn:collation-key
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/273#discussion_r133993197 --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/function/library/FN_Apply.java --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.jena.sparql.function.library; + +import java.util.List ; + +import org.apache.jena.atlas.lib.Cache ; +import org.apache.jena.atlas.lib.CacheFactory ; +import org.apache.jena.graph.Node ; +import org.apache.jena.sparql.expr.ExprEvalException ; +import org.apache.jena.sparql.expr.ExprList ; +import org.apache.jena.sparql.expr.NodeValue ; +import org.apache.jena.sparql.function.Function ; +import org.apache.jena.sparql.function.FunctionBase ; +import org.apache.jena.sparql.function.FunctionFactory ; +import org.apache.jena.sparql.function.FunctionRegistry ; +import org.apache.jena.sparql.sse.builders.ExprBuildException ; +import org.apache.jena.sparql.util.Context ; + +/** XPath and XQuery Functions and Operators 3.1 + * + * {@code fn:apply(function, args)} + */ +public class FN_Apply extends FunctionBase { +// Assumes one object per use site. +private Cache cache1 = CacheFactory.createOneSlotCache(); + +@Override +public void checkBuild(String uri, ExprList args) { +if ( args.isEmpty() ) +throw new ExprBuildException("fn:apply: no function to call (minimum number of args is one)"); +} +@Override +public NodeValue exec(List args) { +if ( args.isEmpty() ) +throw new ExprBuildException("fn:apply: no function to call (minimum number of args is one)"); +NodeValue functionId = args.get(0); +List argExprs = args.subList(1,args.size()) ; +ExprList exprs = new ExprList(); +argExprs.forEach((a)->exprs.add(a)); --- End diff -- ARGH OMG I hate you Java generics. I've gotten used to Scala's flexibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request #273: JENA-1372: fn:apply and fn:collation-key
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/273#discussion_r133992495 --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/function/library/FN_Apply.java --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.jena.sparql.function.library; + +import java.util.List ; + +import org.apache.jena.atlas.lib.Cache ; +import org.apache.jena.atlas.lib.CacheFactory ; +import org.apache.jena.graph.Node ; +import org.apache.jena.sparql.expr.ExprEvalException ; +import org.apache.jena.sparql.expr.ExprList ; +import org.apache.jena.sparql.expr.NodeValue ; +import org.apache.jena.sparql.function.Function ; +import org.apache.jena.sparql.function.FunctionBase ; +import org.apache.jena.sparql.function.FunctionFactory ; +import org.apache.jena.sparql.function.FunctionRegistry ; +import org.apache.jena.sparql.sse.builders.ExprBuildException ; +import org.apache.jena.sparql.util.Context ; + +/** XPath and XQuery Functions and Operators 3.1 + * + * {@code fn:apply(function, args)} + */ +public class FN_Apply extends FunctionBase { +// Assumes one object per use site. +private Cache cache1 = CacheFactory.createOneSlotCache(); + +@Override +public void checkBuild(String uri, ExprList args) { +if ( args.isEmpty() ) +throw new ExprBuildException("fn:apply: no function to call (minimum number of args is one)"); +} +@Override +public NodeValue exec(List args) { +if ( args.isEmpty() ) --- End diff -- Okay, so the checks could be factored out and either called from both `checkBuild` and `exec`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Custom querying algorithm in Jena
This might be a better question for the Jena dev@ list. I'm copying it there. In any event, can you say a little more about what you mean by "a new querying algorithm"? Presumably you have some specific technique you are investigating? ajs6f e1425...@student.tuwien.ac.at wrote on 8/18/17 10:37 AM: Dear Jena development-community, my name is Markus Buchta and I am student at the University of Technology of Vienna. For my bachelor's thesis I want to implement a new querying algorithm into Jena. Since the the project is pretty large and pretty hard to understand for a new developer, I want to know if you have any tips for me? I asking myself where should I start and what is even possible to change at the query evaluation process? I want to already thank you for your help and wish you a nice weekend. Sincerely Markus Buchta
[GitHub] jena pull request #273: JENA-1372: fn:apply and fn:collation-key
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/273#discussion_r133971637 --- Diff: jena-base/src/main/java/org/apache/jena/atlas/lib/cache/Cache0.java --- @@ -39,7 +39,13 @@ public V getIfPresent(K key) { @Override public V getOrFill(K key, Callable callable) { -return null ; +try { +return callable.call() ; +} +catch (Exception e) { +e.printStackTrace(); --- End diff -- `printStackTrace()`? Isn't it better to use a logger? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request #273: JENA-1372: fn:apply and fn:collation-key
Github user ajs6f commented on a diff in the pull request: https://github.com/apache/jena/pull/273#discussion_r133970438 --- Diff: jena-arq/src/main/java/org/apache/jena/sparql/function/library/FN_Apply.java --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.jena.sparql.function.library; + +import java.util.List ; + +import org.apache.jena.atlas.lib.Cache ; +import org.apache.jena.atlas.lib.CacheFactory ; +import org.apache.jena.graph.Node ; +import org.apache.jena.sparql.expr.ExprEvalException ; +import org.apache.jena.sparql.expr.ExprList ; +import org.apache.jena.sparql.expr.NodeValue ; +import org.apache.jena.sparql.function.Function ; +import org.apache.jena.sparql.function.FunctionBase ; +import org.apache.jena.sparql.function.FunctionFactory ; +import org.apache.jena.sparql.function.FunctionRegistry ; +import org.apache.jena.sparql.sse.builders.ExprBuildException ; +import org.apache.jena.sparql.util.Context ; + +/** XPath and XQuery Functions and Operators 3.1 + * + * {@code fn:apply(function, args)} + */ +public class FN_Apply extends FunctionBase { +// Assumes one object per use site. +private Cache cache1 = CacheFactory.createOneSlotCache(); + +@Override +public void checkBuild(String uri, ExprList args) { +if ( args.isEmpty() ) +throw new ExprBuildException("fn:apply: no function to call (minimum number of args is one)"); +} +@Override +public NodeValue exec(List args) { +if ( args.isEmpty() ) +throw new ExprBuildException("fn:apply: no function to call (minimum number of args is one)"); +NodeValue functionId = args.get(0); +List argExprs = args.subList(1,args.size()) ; +ExprList exprs = new ExprList(); +argExprs.forEach((a)->exprs.add(a)); --- End diff -- Maybe `argExprs.forEach(exprs::add);`? It's not clear to me why not `ExprList exprs = new ExprList(argExprs);`. I'm sure there's a reason-- I must be missing something in the execution flow? Why copy instead of view? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---