[GitHub] jena pull request #314: JENA-1430

2017-11-29 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/314#discussion_r153813903
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java
 ---
@@ -26,25 +30,29 @@
 import org.apache.jena.atlas.logging.Log ;
 import org.apache.jena.query.Dataset ;
 import org.apache.jena.query.DatasetFactory ;
-import org.apache.jena.rdf.model.Model ;
-import org.apache.jena.rdf.model.RDFNode ;
-import org.apache.jena.rdf.model.Resource ;
+import org.apache.jena.rdf.model.*;
 import org.apache.jena.sparql.graph.GraphFactory ;
 import org.apache.jena.sparql.util.FmtUtils ;
 import org.apache.jena.sparql.util.graph.GraphUtils ;
+import org.apache.jena.system.Txn;
 
 public class DatasetAssembler extends AssemblerBase implements Assembler {
 public static Resource getType() {
 return DatasetAssemblerVocab.tDataset ;
 }
 
 @Override
-public Object open(Assembler a, Resource root, Mode mode) {
+public Dataset open(Assembler a, Resource root, Mode mode) {
 Dataset ds = createDataset(a, root, mode) ;
 return ds ;
 }
 
 public Dataset createDataset(Assembler a, Resource root, Mode mode) {
+checkType(root, DatasetAssemblerVocab.tDataset);
+// use TIM if quads are loaded or if all named Graphs are loaded 
via data property
+final boolean allNamedGraphsLoadViaData = multiValueResource(root, 
pNamedGraph).stream().allMatch(g -> g.hasProperty(data));
+if (root.hasProperty(data) || allNamedGraphsLoadViaData) return 
new InMemDatasetAssembler().open(a, root, mode);
--- End diff --

Much clearer than otherwise, to my eye. This is style, and I'm happy to 
change this to be clearer for you, but it's not an objective question.


---


[GitHub] jena pull request #317: JENA-1440: TDB2 - transform bytes to NodeIds directl...

2017-11-27 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/317#discussion_r153284608
  
--- Diff: 
jena-db/jena-dboe-base/src/main/java/org/apache/jena/dboe/base/buffer/RecordBufferIteratorMapper.java
 ---
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.dboe.base.buffer;
+
+import static org.apache.jena.atlas.lib.Alg.decodeIndex ;
+
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+
+import org.apache.jena.atlas.lib.Bytes;
+import org.apache.jena.dboe.base.record.Record;
+import org.apache.jena.dboe.base.record.RecordMapper;
+
+// Iterate over one RecordBuffer
+public class RecordBufferIteratorMapper implements Iterator
+{
+private RecordBuffer rBuff ;
+private int nextIdx ;
+private X slot = null ;
+private final byte[] keySlot ;
+private final Record maxRec ;
+private final Record minRec ;
+private final RecordMapper mapper;
+
+//RecordBufferIteratorMapper(RecordBuffer rBuff)
+//{ this(rBuff, null, null); }
+
+RecordBufferIteratorMapper(RecordBuffer rBuff, Record minRecord, 
Record maxRecord, int keyLen, RecordMapper mapper)
+{
+this.rBuff = rBuff ;
+this.mapper = mapper ;
+this.keySlot = (maxRecord==null) ? null : new byte[keyLen];
+nextIdx = 0 ;
+minRec = minRecord ;
+if ( minRec != null )
+{
+nextIdx = rBuff.find(minRec) ;
+if ( nextIdx < 0 )
+nextIdx = decodeIndex(nextIdx) ;
+}
+
+maxRec = maxRecord ; 
+}
+
+private void finish()
+{
+rBuff = null ;
+nextIdx = -99 ;
--- End diff --

Might be nice to call this out as a constant, like `NO_NEXT_INDEX` or the 
like.


---


Re: Jena 3.6.0?

2017-11-27 Thread ajs6f
Right, I just wouldn't want to make 3.6.0 wait on it if the other stuff gets 
done.

ajs6f

> On Nov 27, 2017, at 9:51 AM, Andy Seaborne  wrote:
> 
> 
> 
> On 27/11/17 14:30, ajs6f wrote:
>> Comments inline...
>> ajs6f
>>> On Nov 27, 2017, at 8:10 AM, Andy Seaborne  wrote:
>>> 
>>> ...
>>> 1/ The jena-text documentation improvements
>> Is this required for or by a release? Can we not do this independently?
> 
> Required? No.
> 
> It needs doing and the website gets updated on release.
> 
>Andy



Re: Jena 3.6.0?

2017-11-27 Thread ajs6f
Comments inline...

ajs6f

> On Nov 27, 2017, at 8:10 AM, Andy Seaborne  wrote:
> 
> ...
> 1/ The jena-text documentation improvements

Is this required for or by a release? Can we not do this independently?

> 2/ Downgrade shiro to 1.2.6
> 3/ riot: status code on warnings (#315)

+1 to merging; I would ideally like to confirm the fix with Ian Dickinson 
before closing the ticket.

> 4/ Ideally, dataset assembler (#314) [might be too tight for time].

Waiting on feedback from Andy (and anyone else who might be interested).

> Anything else?

1391 is still hanging, but with a release this close I don't think I can write 
enough tests before then to feel comfortable sending a PR, so let's leave it be.

> 
> Rob - I can merge #315 and we can sort out the implementation stuff later.
> 
>Andy
> 
> On 25/11/17 23:45, ajs6f wrote:
>> Ditto, except for me it's the 8th.
>> ajs6f
>>> On Nov 25, 2017, at 6:12 PM, Bruno P. Kinoshita 
>>>  wrote:
>>> 
>>> I can run the build and verify signatures any day in the next weeks. Just 
>>> not much time to properly test Fuseki and review changes until after Dec 
>>> 3rd.
>>> CheersBruno
>>> 
>>>  From: Andy Seaborne 
>>> To: "dev@jena.apache.org" 
>>> Sent: Sunday, 26 November 2017 12:02 PM
>>> Subject: Jena 3.6.0?
>>> 
>>> The bug in Fuseki that causes UI uploads to fail, and some other UI
>>> issues, is a bit annoying.
>>> 
>>> Is there the energy and time to vote on a 3.6.0 release if I build one?
>>> Please respond if you'll be able to vote in the next few weeks.
>>> 
>>> If there is - from our experience last time, we can test the latest
>>> development builds now, before a formal VOTE which will shorten the time
>>> in case there is any problems to address.
>>> 
>>> Andy
>>> 
>>> The build is complaining about a Shiro issue - this is harmless and a
>>> problem somewhere in the Fuseki tests. Some state is getting initialized
>>> twice.  It does not happen when Fuseki is run nor does it cause any
>>> tests to fail.  It happens because of the 1.2.4->1.4.0 Shiro upgrade ;
>>> it comes in at 1.2.6 -> 1.3.0. Solution: ship with 1.2.6
>>> 
>>> """
>>> [...] IniRealm  WARN  Users or Roles are already populated.  Configured
>>> Ini instance will be ignored.
>>> """
>>> 
>>> Andy
>>> 
>>> 
>>> 



Re: Jena 3.6.0?

2017-11-25 Thread ajs6f
Ditto, except for me it's the 8th.

ajs6f

> On Nov 25, 2017, at 6:12 PM, Bruno P. Kinoshita 
>  wrote:
> 
> I can run the build and verify signatures any day in the next weeks. Just not 
> much time to properly test Fuseki and review changes until after Dec 3rd.
> CheersBruno
> 
>  From: Andy Seaborne 
> To: "dev@jena.apache.org"  
> Sent: Sunday, 26 November 2017 12:02 PM
> Subject: Jena 3.6.0?
> 
> The bug in Fuseki that causes UI uploads to fail, and some other UI 
> issues, is a bit annoying.
> 
> Is there the energy and time to vote on a 3.6.0 release if I build one?
> Please respond if you'll be able to vote in the next few weeks.
> 
> If there is - from our experience last time, we can test the latest 
> development builds now, before a formal VOTE which will shorten the time 
> in case there is any problems to address.
> 
> Andy
> 
> The build is complaining about a Shiro issue - this is harmless and a 
> problem somewhere in the Fuseki tests. Some state is getting initialized 
> twice.  It does not happen when Fuseki is run nor does it cause any 
> tests to fail.  It happens because of the 1.2.4->1.4.0 Shiro upgrade ; 
> it comes in at 1.2.6 -> 1.3.0. Solution: ship with 1.2.6
> 
> """
> [...] IniRealm  WARN  Users or Roles are already populated.  Configured 
> Ini instance will be ignored.
> """
> 
> Andy
> 
> 
> 



Re: CMS diff: DB2 - Use with Fuseki2

2017-11-25 Thread ajs6f
Committed, thanks!

ajs6f

> On Nov 25, 2017, at 2:40 PM, Laura  wrote:
> 
> Clone URL (Committers only):
> https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Ftdb2%2Ftdb2_fuseki.md
> 
> Laura
> 
> Index: trunk/content/documentation/tdb2/tdb2_fuseki.md
> ===
> --- trunk/content/documentation/tdb2/tdb2_fuseki.md   (revision 1816255)
> +++ trunk/content/documentation/tdb2/tdb2_fuseki.md   (working copy)
> @@ -17,7 +17,7 @@
> PREFIX fuseki:  <http://jena.apache.org/fuseki#>;
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>;
> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>;
> -PREFIX tdb2:<http://jena.apache.org/2016/tdb#>;;
> +PREFIX tdb2:<http://jena.apache.org/2016/tdb#>;
> PREFIX ja:  <http://jena.hpl.hp.com/2005/11/Assembler#>;
> 
> [] rdf:type fuseki:Server ;
> 



Re: mapping URIs

2017-11-22 Thread ajs6f
Claude, are you saying you want people to be able to query Fuseki using 
urn:foo:bar:yeehaw and get back answers using http://server:8080/yeehaw?

Otherwise, I'm guessing I'm missing something, but why wouldn't you do the 
substitutions on the way from the backend to Fuseki?

ajs6f

> On Nov 22, 2017, at 12:13 PM, Claude Warren  wrote:
> 
> I have a case where data are generated in a backend system that is not
> publicly accessible and has no idea where the data are going to be served
> from.
> 
> The backend system generates URNs like ""
> 
> What I think I want to do is on the fuseki server be able to configure
> "urn:foo:bar" as a place holder for "http://server:8080/yeehaw";.
> 
> Now, I know I can add this as part of an OWL:sameValue but I would like to
> see Fuseki do that.
> 
> In this way when the data are hosted on another system the resolution can
> be adjusted appropriately.
> 
> Perhaps this does not make sense.  Perhaps there is a way to do this
> already.  Perhaps this is a really bad idea.  So I am throwing it out there
> to see if there are any comments.
> 
> Thx,
> Claude
> 
> -- 
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren



[GitHub] jena issue #314: JENA-1430

2017-11-22 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/314
  
@afs What do you think of that? It's clearer, I think, along the lines [you 
suggested](https://github.com/apache/jena/pull/314#discussion_r152289270).


---


[GitHub] jena pull request #314: JENA-1430

2017-11-21 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/314#discussion_r152397704
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java
 ---
@@ -58,27 +64,33 @@ public Dataset createDataset(Assembler a, Resource 
root, Mode mode) {
 // Assembler description did not define one.
 dftModel = GraphFactory.makeDefaultModel() ;
 Dataset ds = DatasetFactory.create(dftModel) ;
-//  Named graphs
-List nodes = GraphUtils.multiValue(root, 
DatasetAssemblerVocab.pNamedGraph) ;
-for ( RDFNode n : nodes ) {
-if ( !(n instanceof Resource) )
-throw new DatasetAssemblerException(root, "Not a resource: 
" + FmtUtils.stringForRDFNode(n));
-Resource r = (Resource)n;
-
-String gName = GraphUtils.getAsStringValue(r, 
DatasetAssemblerVocab.pGraphName);
-Resource g = GraphUtils.getResourceValue(r, 
DatasetAssemblerVocab.pGraph);
-if ( g == null ) {
-g = GraphUtils.getResourceValue(r, 
DatasetAssemblerVocab.pGraphAlt);
-if ( g != null ) {
-Log.warn(this, "Use of old vocabulary: use :graph not 
:graphData");
-} else {
-throw new DatasetAssemblerException(root, "no graph 
for: " + gName);
+Txn.executeWrite(ds, () -> {
+// Load data into the default graph or quads into the dataset.
+multiValueAsString(root, data)
+.forEach(dataURI -> read(ds, dataURI));
+
--- End diff --

@afs What's a good idiom for switching to a new assembler? In other words, 
let's say the code tests for the presence of quads and finds them and it's time 
to pivot to TIM. Obviously, I could just `new InMemDatasetAssembler()`, but I'm 
thinking there must be a more elegant way. I looked at `AssemblerUtils` but 
only saw ways to register `Assembler`s, not retrieve them…


---


[GitHub] jena pull request #314: JENA-1430

2017-11-21 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/314#discussion_r152311436
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java
 ---
@@ -58,27 +64,33 @@ public Dataset createDataset(Assembler a, Resource 
root, Mode mode) {
 // Assembler description did not define one.
 dftModel = GraphFactory.makeDefaultModel() ;
 Dataset ds = DatasetFactory.create(dftModel) ;
-//  Named graphs
-List nodes = GraphUtils.multiValue(root, 
DatasetAssemblerVocab.pNamedGraph) ;
-for ( RDFNode n : nodes ) {
-if ( !(n instanceof Resource) )
-throw new DatasetAssemblerException(root, "Not a resource: 
" + FmtUtils.stringForRDFNode(n));
-Resource r = (Resource)n;
-
-String gName = GraphUtils.getAsStringValue(r, 
DatasetAssemblerVocab.pGraphName);
-Resource g = GraphUtils.getResourceValue(r, 
DatasetAssemblerVocab.pGraph);
-if ( g == null ) {
-g = GraphUtils.getResourceValue(r, 
DatasetAssemblerVocab.pGraphAlt);
-if ( g != null ) {
-Log.warn(this, "Use of old vocabulary: use :graph not 
:graphData");
-} else {
-throw new DatasetAssemblerException(root, "no graph 
for: " + gName);
+Txn.executeWrite(ds, () -> {
+// Load data into the default graph or quads into the dataset.
+multiValueAsString(root, data)
+.forEach(dataURI -> read(ds, dataURI));
+
--- End diff --

Okay, if I get what you are saying, it's:

1. Check to see if quads are being loaded, if so, TIM. 
2. Otherwise, check the named graphs. If they are all `ja:data` guys, then 
TIM again.
3. Otherwise, general dataset.

I'll get on this later today.


---


[GitHub] jena pull request #314: JENA-1430

2017-11-21 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/314#discussion_r152284128
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/sparql/core/assembler/DatasetAssembler.java
 ---
@@ -58,27 +64,33 @@ public Dataset createDataset(Assembler a, Resource 
root, Mode mode) {
 // Assembler description did not define one.
 dftModel = GraphFactory.makeDefaultModel() ;
 Dataset ds = DatasetFactory.create(dftModel) ;
-//  Named graphs
-List nodes = GraphUtils.multiValue(root, 
DatasetAssemblerVocab.pNamedGraph) ;
-for ( RDFNode n : nodes ) {
-if ( !(n instanceof Resource) )
-throw new DatasetAssemblerException(root, "Not a resource: 
" + FmtUtils.stringForRDFNode(n));
-Resource r = (Resource)n;
-
-String gName = GraphUtils.getAsStringValue(r, 
DatasetAssemblerVocab.pGraphName);
-Resource g = GraphUtils.getResourceValue(r, 
DatasetAssemblerVocab.pGraph);
-if ( g == null ) {
-g = GraphUtils.getResourceValue(r, 
DatasetAssemblerVocab.pGraphAlt);
-if ( g != null ) {
-Log.warn(this, "Use of old vocabulary: use :graph not 
:graphData");
-} else {
-throw new DatasetAssemblerException(root, "no graph 
for: " + gName);
+Txn.executeWrite(ds, () -> {
+// Load data into the default graph or quads into the dataset.
+multiValueAsString(root, data)
+.forEach(dataURI -> read(ds, dataURI));
+
--- End diff --

I'm trying to think of a use case for "load quads into non-TIM" and the one 
that occurs to me is in an embedded or integrated situation where you have a 
lot of quads, like so many that you prefer the memory-parsimonious-ness of the 
general IM dataset, maybe because you have other processes running in the 
system. Sound likely enough to merit (2)? 


---


[GitHub] jena pull request #314: JENA-1430

2017-11-21 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/314#discussion_r152283404
  
--- Diff: jena-fuseki2/examples/fuseki-in-mem-txn.ttl ---
@@ -0,0 +1,24 @@
+## Licensed under the terms of http://www.apache.org/licenses/LICENSE-2.0
+
+@prefix :<#> .
+@prefix fuseki:  <http://jena.apache.org/fuseki#> .
+@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
+@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#> .
+@prefix ja:  <http://jena.hpl.hp.com/2005/11/Assembler#> .
+@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
+
+<#serviceInMemory> rdf:type fuseki:Service;
+rdfs:label   "In-memory, trasnactioal dataset.";
+fuseki:name  "ds";
+fuseki:serviceQuery  "query";
+fuseki:serviceQuery  "sparql";
+fuseki:serviceUpdate "update";
+fuseki:serviceUpload "upload" ;
+fuseki:serviceReadGraphStore "data" ;
+fuseki:serviceReadGraphStore "get" ;
+fuseki:dataset <#dataset> ;
+.
+
+<#dataset> rdf:type ja:DatasetTxnMem;
--- End diff --

Nice catch, thanks, fixed.


---


[GitHub] jena issue #306: Algorithms for JENA-1414

2017-11-20 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/306
  
Okay, now I get it. Agreed that number 3 is "trying too hard" and on the 
proposal to provide number 2 and document appropriate usage.


---


[GitHub] jena pull request #306: Algorithms for JENA-1414

2017-11-20 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/306#discussion_r152085996
  
--- Diff: jena-core/src/main/java/org/apache/jena/graph/GraphUtil.java ---
@@ -246,43 +282,214 @@ private static void deleteIteratorWorkerDirect(Graph 
graph, Iterator it)
 }
 }
 
-private static final int sliceSize = 1000 ;
-/** A safe and cautious remove() function that converts the remove to
- *  a number of {@link Graph#delete(Triple)} operations. 
+private static int MIN_SRC_SIZE   = 1000 ;
+// If source and destination are large, limit the search for the best 
way round to "deleteFrom" 
+private static int MAX_SRC_SIZE   = 1000*1000 ;
+private static int DST_SRC_RATIO  = 2 ;
+
+/**
+ * Delete triples in the destination (arg 1) as given in the source 
(arg 2).
+ *
+ * @implNote
+ *  This is designed for the case of {@code dstGraph} being comparable 
or much larger than
+ *  {@code srcGraph} or {@code srcGraph} having a lot of triples to 
actually be
+ *  deleted from {@code dstGraph}. This includes large, persistent 
{@code dstGraph}.
+ *
+ *  It is not designed for a large {@code srcGraph} and large {@code 
dstGraph} 
+ *  with only a few triples in common delete from {@code dstGraph}. It 
is better to
+ *  calculate the difference in someway, and copy into a small graph 
to use as the {@srcGraph}.  
--- End diff --

typo: some way


---


[GitHub] jena pull request #314: JENA-1430

2017-11-20 Thread ajs6f
GitHub user ajs6f opened a pull request:

https://github.com/apache/jena/pull/314

JENA-1430

Includes #313, plus:

- Extend testing to `DatasetAssembler`
- Ensure that `DatasetAssembler` can also load quads
- Correct `ja:DatasetTxnMem` => `ja:MemoryDataset`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajs6f/jena JENA-1430p

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #314


commit d174ec04dccb205de96e63c775e01f948380f8cc
Author: Andy Seaborne 
Date:   2017-11-20T10:57:01Z

JENA-1430: Read quads for ja:data by filename

commit 3e13dc64f4047eb589d9da46e50561a25290a230
Author: ajs6f 
Date:   2017-11-20T18:47:42Z

JENA-1430: Quad loading for in-memory assemblers




---


Re: CMS diff: Jena Full Text Search

2017-11-20 Thread ajs6f
I went to review this diff and rediscovered (to my chagrin) that I really know 
very little about Jena's text indexing.

Osma (or anyone else who knows text indexing better than do I, which wouldn't 
take much)-- could you review this? It's got some great useful detail about how 
the indexing works and can be used.

ajs6f

> On Nov 20, 2017, at 1:51 AM, Chris Tomlinson  wrote:
> 
> Clone URL (Committers only):
> https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Fquery%2Ftext-query.mdtext
> 
> Chris Tomlinson
> 
> Index: trunk/content/documentation/query/text-query.mdtext
> ===
> --- trunk/content/documentation/query/text-query.mdtext   (revision 
> 1815762)
> +++ trunk/content/documentation/query/text-query.mdtext   (working copy)
> @@ -1,5 +1,7 @@
> Title: Jena Full Text Search
> 
> +Title: Jena Full Text Search
> +
> This extension to ARQ combines SPARQL and full text search via
> [Lucene](https://lucene.apache.org) 6.4.1 or
> [ElasticSearch](https://www.elastic.co) 5.2.1 (which is built on
> @@ -64,7 +66,20 @@
> ## Table of Contents
> 
> -   [Architecture](#architecture)
> +-   [External content](#external-content)
> +-   [External applications](#external-applications)
> +-   [Document structure](#document-structure)
> -   [Query with SPARQL](#query-with-sparql)
> +-   [Syntax](#syntax)
> +-   [Input arguments](#input-arguments)
> +-   [Output arguments](#output-arguments)
> +-   [Query strings](#query-strings)
> +-   [Simple queries](#simple-queries)
> +-   [Queries with language tags](#queries-with-language-tags)
> +-   [Queries that retrieve literals](#queries-that-retrieve-literals)
> +-   [Queries across multiple 
> `Field`s](#queries-across-multiple-fields)
> +-   [Queries within a `Field`](#queries-within-a-field)
> +-   [Good practice](#good-practice)
> -   [Configuration](#configuration)
> -   [Text Dataset Assembler](#text-dataset-assembler)
> -   [Configuring an analyzer](#configuring-an-analyzer)
> @@ -134,6 +149,69 @@
> By using Elasticsearch, other applications can share the text index with
> SPARQL search.
> 
> +### Document structure
> +
> +As mentioned above, text indexing of a triple involves associating a Lucene
> +document with the triple. How is this done?
> +
> +Lucene documents are composed of `Field`s. Indexing and searching are 
> performed 
> +over the contents of these `Field`s. For an RDF triple to be indexed in 
> Lucene the 
> +_property_ of the triple must be 
> +[configured in the entity map of a TextIndex](#entity-map-definition).
> +This associates a Lucene analyzer with the _`property`_ which will be used
> +for indexing and search. The _`property`_ becomes the _searchable_ Lucene 
> +`Field` in the resulting document.
> +
> +A Lucene index includes a _default_ `Field`, which is specified in the 
> configuration, 
> +that is the field to search if not otherwise named in the query. In 
> jena-text 
> +this field is configured via the `text:defaultField` property which is then 
> mapped 
> +to a specific RDF property via `text:predicate` (see [entity 
> map](#entity-map-definition) 
> +below).
> +
> +There are several additional `Field`s that will be included in the
> +document that is passed to the Lucene `IndexWriter` depending on the
> +configuration options that are used. These additional fields are used to
> +manage the interface between Jena and Lucene and are not generally 
> +searchable per se.
> +
> +The most important of these additional `Field`s is the `text:entityField`.
> +This configuration property defines the name of the `Field` that will contain
> +the _URI_ or _blank node id_ of the _subject_ of the triple being indexed. 
> This property does
> +not have a default and must be specified for most uses of `jena-text`. This
> +`Field` is often given the name, `uri`, in examples. It is via this `Field`
> +that `?s` is bound in a typical use such as:
> +
> +select ?s
> +where {
> +?s text:query "some text"
> +}
> +
> +Other `Field`s that may be configured: `text:uidField`, `text:graphField`,
> +and so on are discussed below.
> +
> +Given the triple:
> +
> +ex:SomeOne skos:prefLabel "zorn protégé a prés"@fr ;
> +
> +The following illustrates a Lucene document that Jena will create and
> +request Lucene to index:
> +
> +Document<
> +stored, indexed, indexOptions=DOCS http://example.org/SomeOne> 
> +indexed, omitNorms, indexOptions=DOCS 
>  
> +stored, in

Re: gitpubsub

2017-11-19 Thread ajs6f
Bruno (or anyone), do you know if it would be possible to publish site changes 
for review out of Apache CI? (Something like the way we can set up to get built 
artifacts from branches of the codebase without actually releasing them.)

Is it okay with respect to Apache policy to only import the current state of 
the site to Git (iow to leave behind that massive accumulation of Javadocs), or 
do we need to maintain a complete history on whatever infrastructure we use?

ajs6f

> On Nov 17, 2017, at 3:30 AM, Bruno P. Kinoshita 
>  wrote:
> 
>> What changes if we go for gitpubsub?
> 
> 
> Not much for end users. For developers, we would need to get used to 
> whichever tool we choose for static site generator.
> 
> 
>> If I read that right, no CMS because CMS is svnpubsub only.  Is it a "big 
>> bang" switch to Jekyll? That isn't too scary but it is a step-change.
> 
> Not much I think. Most of the Markdown can be easily ported with some 
> regex/shell script. When I helped porting OpenNLP's site, I used Jena website 
> as reference for parts of their new layout and general organization. If you 
> open both sites opennlp.apache.org and jena.apache.org, you may find they are 
> both very similar.
> 
> And we don't have to necessarily use Jekyll. If the consensus is for another 
> tool (e.g. Pelican, Hexo, JBake, etc) we just need to confirm with Apache 
> Infra if they are able to run the same tool in their automation pipeline.
> 
> 
>> One thing we do benefit from currently is content fixes via CMS - we may 
>> have to change that. I guess there is no jena.staging.a.o? It becomes local 
>> Jekyll build?
> 
> As far as I know, that is right. However, users can run something like 
> `jekyll serve`. I like the current process, but if you have a great change, 
> it is hard to get feedback without committing to SVN, having some draft in 
> the staging area.
> 
> With the gitpubsub + some static site generator. Or we can even share our own 
> GitHub fork website. OpenNLP template has an issue with extra paths, so this 
> is broken, but we can work to have Jena website working correctly, and send a 
> pull request to opennlp's repo: https://kinow.github.io/opennlp-site/.
> 
> So if we have a new repository like github.com/apache/jena-site, then I could 
> fork it under github.com/kinow/jena-site, work in my own fork, prepare pull 
> requests, and include a link like https://kinow.github.io/jena-site. I prefer 
> this approach to having to `svn commit` to preview in the staging area.
> 
> 
>> A project can have more then one git repo so I guess we can choose whether 
>> to use the main repo or not.  Our site .svn is 2.2G (probably all those 
>> javadoc changes). Or a separate repo git-include-submodule in the main one?
> 
> Oh, very good point. OpenNLP has/had the same issue. Not sure if that was 
> fixed. Their old docs are served here: 
> http://opennlp.apache.org/docs/legacy.html
> 
> I believe it's done here: 
> https://github.com/apache/opennlp-site/blob/0303866c56689f602dc9258b32e1a64f59ea82e4/pom.xml#L204
> 
> Though not entirely sure how it works. I can join the Slack channel next week 
> and check with them. The first version of the site included all the old 
> javadocs, and was quite slow to check out and build.
> 
> There was some service interruption during the Apache Infra automation 
> set-up. But given OpenNLP just went through the process, it would be simpler, 
> as we could just tell them to look at the job and instead of Maven/JBake, run 
> jekyll or whatever tool we choose. I would be happy to volunteer and create 
> ticket to create jena-site repository in GitHub. Then once we have the site 
> being generated there and we have validated it, I can create the ticket for 
> INFRA to set up the automation, and switch from svnpubsub to gitpubsub.
> 
> 
> Cheers
> Bruno
> 
> 
> 
> 
> From: Andy Seaborne 
> To: dev@jena.apache.org 
> Sent: Sunday, 12 November 2017 4:56 AM
> Subject: gitpubsub
> 
> 
> 
> 
> On 09/11/17 20:51, Bruno P. Kinoshita wrote:
> ...
>> However, I'm +1 for moving our site to Git.
> 
> What changes if we go for gitpubsub?
> 
> All I know about it is the bullet point on 
> https://www.apache.org/dev/project-site.html.
> 
> If I read that right, no CMS because CMS is svnpubsub only.  Is it a 
> "big bang" switch to Jekyll? That isn't too scary but it is a step-change.
> 
> One thing we do benefit from currently is content fixes via CMS - we may 
> have to change that. I guess there is no jena.staging.a.o? It becomes 
> local Jekyll build?
> 
> A project can have more then one git repo so I guess we can choose 
> whether to use the main repo or not.  Our site .svn is 2.2G (probably 
> all those javadoc changes). Or a separate repo git-include-submodule in 
> the main one?
> 
> Andy



[GitHub] jena issue #312: Use top POM as parent.

2017-11-18 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/312
  
Agreed, with SVN you deal in versions and there is a fairly natural mapping 
to modules, in DVC like git you deal with deltas and the module boundaries 
aren't as useful a way to organize change management.

We can always go to full on OSGi and run everything through dynamic 
services for full module decoupling!  :stuck_out_tongue_winking_eye:


---


[GitHub] jena issue #312: Use top POM as parent.

2017-11-18 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/312
  
I agree that Jena doesn't (and shouldn't have) a monolithic build, but do 
we want individual modules to be build-able separately? I'm not sure what the 
use case for that is...


---


Re: Generic RDFVisitor

2017-11-17 Thread ajs6f
Perhaps you can say a little more about your use case here? I think we could 
probably work something out for this feature, but I am curious about why you 
are reaching for the visitor pattern?

ajs6f

> On Nov 17, 2017, at 11:27 AM, Adam Jacobs  wrote:
> 
> Perhaps only a single generic parameter then, if each method should return 
> the same type.
> Or a sub-interface in which all three parameters are the same, the way that 
> Java's `UnaryOperator` is related to `Function`.
> 
> 
> ____
> From: ajs6f 
> Sent: Friday, November 17, 2017 10:01 AM
> To: dev@jena.apache.org
> Subject: Re: Generic RDFVisitor
> 
> Not sure how that would play against:
> 
> Object org.apache.jena.rdf.model.impl.ResourceImpl.visitWith(RDFVisitor)
> 
> OTOH, I'm not sure how much use the visitor pattern there has ever really 
> gotten...
> 
> ajs6f
> 
>> On Nov 17, 2017, at 10:55 AM, Adam Jacobs  wrote:
>> 
>> I wonder if it would be useful to generify the `RDFVisitor` interface...
>> 
>> public interface RDFVisitor {
>> 
>>   B visitBlank( Resource r, AnonId id );
>>   U visitURI( Resource r, String uri );
>>   L visitLiteral( Literal l );
>> 
>> }
> 



[GitHub] jena issue #312: Use top POM as parent.

2017-11-17 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/312
  
You got me with that dry English sense of humor. :wink: 

> No rush to make this change but aiming to change once would be better, 
especially if across a release.

Good point. Let's do this gracefully instead of spastically.

We could look at some other ways to slice verbiage out of the top POM, 
although I admit I can't think of anything that would take as large a slice as 
`dependencyManagement`.


---


[GitHub] jena issue #312: Use top POM as parent.

2017-11-17 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/312
  
What?

"standard techniques" is an ordinary Maven BOM.






---


Re: Generic RDFVisitor

2017-11-17 Thread ajs6f
Not sure how that would play against:

Object org.apache.jena.rdf.model.impl.ResourceImpl.visitWith(RDFVisitor)

OTOH, I'm not sure how much use the visitor pattern there has ever really 
gotten...

ajs6f

> On Nov 17, 2017, at 10:55 AM, Adam Jacobs  wrote:
> 
> I wonder if it would be useful to generify the `RDFVisitor` interface...
> 
> public interface RDFVisitor {
> 
>B visitBlank( Resource r, AnonId id );
>U visitURI( Resource r, String uri );
>L visitLiteral( Literal l );
> 
> }



Re: jena-project

2017-11-17 Thread ajs6f
I'm basically +1 to this-- jena-project was always confusing at best.

In theory, we could factor out some of those 932 lines with a Jena Maven BOM. 
Actually, that might be nice for integrators and those using apache-jena-lib.

ajs6f

> On Nov 17, 2017, at 10:12 AM, Andy Seaborne  wrote:
> 
> When we moved to one version for all modules, pressure of time pushed us to 
> have jena-project as a copy of the old jena-parent.
> 
> Do we want to go the next step forward which is to merge jena-project into 
> the top POM and drop the jena-project module?
> 
> It turns out to be quite easy to do.
> 
> PR for discussion:
>  https://github.com/apache/jena/pull/312
> 
> It does make the top POM quite large - 932 lines.
> 
> Thoughts?
> 
>Andy



Jira and Gitbox integration?

2017-11-17 Thread ajs6f
Hi, INFRA--

Here at Jena we are considering moving our Apache git <-> Github mirroring to 
accept changes at Github and mirror them to Apache git (currently it's the 
other way around). But right now we have some nice Jira integrations, and so we 
have some questions about how that would work if we reversed the mirroring.

Currently, any mention of a Jira ticket (e.g. "I think this could affect 
JENA-1234") in a Github PR automatically copies the conversation for that PR 
over to the comments in that Jira ticket. Will we be able to keep that 
integration if we reverse the mirroring?

Github treats issues/tickets and PRs very similarly-- is it possible to 
integrate Jira in a similar way so that a PR that doesn't mention an extant 
particular Jira ticket automatically files a new Jira ticket?

Thanks for any info and all that you already do for us!

ajs6f

[GitHub] jena pull request #306: Algorithms for JENA-1414

2017-11-17 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/306#discussion_r151692738
  
--- Diff: jena-base/src/main/java/org/apache/jena/atlas/iterator/Iter.java 
---
@@ -351,6 +351,22 @@ public void remove() {
 return filter(iter, Objects::nonNull) ;
 }
 
+/** Step forward up to {@code steps} places.
+ * Return number of steps taken.
--- End diff --

`@return number of steps taken`


---


Re: TDB2 testing Re: TDB2 merged

2017-11-16 Thread ajs6f
> Adding a template name to the HTTP API would be good but IMO it's a long way 
> off to provide UI access.  TDB1 works for people.

This is true, but if we can give people an easy way to create TDB2 dbs and 
compare them apples-to-apples in their own systems, we will get more feedback 
more quickly.

That having been said, I honestly do not know anything about how the Fuseki UI 
is coded. Is it done with a well-known template library?

ajs6f

> On Nov 16, 2017, at 3:02 PM, Andy Seaborne  wrote:
> 
> 
> 
> On 27/10/17 11:44, Osma Suominen wrote:
>> Hi,
>> As I've promised earlier I took TDB2 for a little test drive, using the 
>> 3.5.0rc1 builds.
>> I tested two scenarios: A server running Fuseki, and command line tools 
>> operating directly on a database directory.
>> 1. Server running Fuseki
>> First the server (running as a VM). Currently I've been using Fuseki with 
>> HDT support, from the hdt-java repository. I'm serving a dataset of about 
>> 39M triples, which occasionally changes (eventually this will be updated 
>> once per month, or perhaps more frequently, even once per day). With HDT, I 
>> can simply rebuild the HDT file (less than 10 minutes) and then restart 
>> Fuseki. Downtime for the endpoint is only a few seconds. But I'm worried 
>> about the state of the hdt-java project, it is not being actively maintained 
>> and it's still based on Fuseki1.
> 
> You don't need to use their Fuseki integration.
> 
>> So I switched (for now) to Fuseki2 with TDB2. It was rather smooth thanks to 
>> the documentation that Andy provided. I usually create Fuseki2 datasets via 
>> the API (using curl), but I noticed that, like the UI, the API only supports 
>> "mem" and "tdb". So I created a "tdb" dataset first, then edited the 
>> configuration file so it uses tdb2 instead.
>> Loading the data took about 17 minutes. I used wget for this, per Andy's 
>> example. This is a bit slower than regenerating the HDT, but acceptable 
>> since I'm only doing it occasionally. I also tested executing queries while 
>> reloading the data. This seemed to work OK even though performance obviously 
>> did suffer. But at least the endpoint remained up.
>> The TDB2 directory ended up at 4.6GB. In contrast, the HDT file + index for 
>> the same data is 560MB.
>> I reloaded the same data, and the TDB2 directory grew to 8.5GB, almost twice 
>> its original size. I understand that the TDB2 needs to be compacted 
>> regularly, otherwise it will keep growing. I'm OK with the large disk space 
>> usage if it's constant, not growing over time like TDB1.
>> 2. Command line tools
>> For this I used an older version of the same dataset with 30M triples, the 
>> same one I used for my HDT vs TDB comparison that I posted on the users 
>> mailing list:
>> http://mail-archives.apache.org/mod_mbox/jena-users/201704.mbox/%3C90c0130b-244d-f0a7-03d3-83b47564c990%40iki.fi%3E
>>  This was on my i3-2330M laptop with 8GB RAM and SSD.
> 
> Thank you for the figures.
> 
>> Loading the data using tdb2.tdbloader took about 18 minutes (about 28k 
>> triples per second). The TDB2 directory is 3.7GB. In contrast, using 
>> tdbloader2, loading took 11 minutes and the TDB directory was 2.7GB. So TDB2 
>> is slower to load and takes more disk space than TDB.
> 
> Those are low figures for 40M.  Lack of free RAM? (It's more acute with TDB2 
> ATM as it does random I/O.) RDF syntax? A lot of long literals?
> 
> Today: TDB2:
> 
> INFO  Finished: 50,005,630 bsbm-50m.nt.gz 738.81s (Avg: 67,684)
> 
> 
>> I ran the same example query I used before on the TDB2. The first time was 
>> slow (33 seconds), but subsequent queries took 16.1-18.0 seconds.
>> I also re-ran the same query on TDB using tdbquery on Jena 3.5.0rc1. The 
>> query took 13.7-14.0 seconds after the first run (24 seconds).
>> I also reloaded the same data to the TDB2 to see the effect. Reloading took 
>> 11 minutes and the database grew to 5.7GB. Then I compacted it using 
>> tdb2.tdbcompact. Compacting took 18 minutes and the disk usage just grew 
>> further, to 9.7GB. The database directory then contained both Data-0001 and 
>> Data-0002 directories. I removed Data-0001 and disk usage fell to 4.0GB. Not 
>> quite the same as the original 3.7GB, but close.
>> My impressions so far: It works, but it's slower than TDB and needs more 
>> disk space. Compaction seems to work, but initially it will just increase 
>> disk usage. The stale data has to be manually removed to actually reclaim 
>> any space. 
> 
>

Re: Issues fixed in Apache Jena

2017-11-16 Thread ajs6f
One possibility: Jena does not (and I assume never did) enforce a 
"squash-before-merging" policy.

That is to say, if I write a PR with ten commits, and it is approved, and we 
merge it, it will normally go in as all ten commits. Some projects demand that 
such a PR be "squashed" (all ten commits be reduced into one with the sum of 
changes present) before merging. 

If that is part of the difference, I suppose it should show up in the same way 
as a difference between Jena and other projects in the number of commits per 
time unit in the main branch.

ajs6f

> On Nov 16, 2017, at 7:55 AM, Γεώργιος Δίγκας  wrote:
> 
> Dear All,
> 
> I would like to thank you for your replies!
> 
>>> What is a single issue in your context?
> SonarQube uses a set of coding 
> rules<https://docs.sonarqube.org/display/SONAR/Issues> in order to measure 
> the TD. While running an analysis, it raises an issue every time a piece of 
> code breaks a coding rule.
>>> I think what is being counted is any issue that SonarQube TD reports, and 
>>> this is being done on every single commit and summed together. This doesn’t 
>>> seem like a particularly meaningful statistic since you would inevitably 
>>> count the same issue N times where N is the number of commits between where 
>>> an issue was introduced and where it was fixed. It seems like there should 
>>> really be some attempts to perform de-duplication.
> The number refers to unique issues and it does not include any duplication. 
> (If one issue was fixed and then after some time the same issue appeared in 
> the same piece of code I count as new).
>>> It also sounds like it doesn’t make any attempts to account for common 
>>> development practices i.e. New code often develops over a series of commits 
>>> with developers implementing outlines first and then refining and cleaning 
>>> up a feature and cleaning up a feature as it matures.
> I totally agree with the last sentence. As I said on my previous e-mails the 
> cleaning up rate on your project is the highest among the Apache projects 
> that I analyzed and I am wondering why is that. What practices do you follow? 
> Is it a coincidence?
>>> There are many (, many) minor things and they outweigh the major problems. 
>>> Calling them all "issues" gives them equal weight. Some are about 
>>> canonicalization of the code.
> I have updated the previously sent spreadsheet 
> (https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing).
>  Now on the sheet: Jena: Open Issues - October 7, 2017 I have added the 
> Severity and the Type of each issue and you can filter them based on these 
> two criteria (they are based on SoanrQube's default classification).
>>> NB the "issue" word has a specific meaning for JIRA which a lot of Apache 
>>> projects use. Jena's current total, now, is 1424.
> Thank you for the clarification. I should had mention in my first e-mail that 
> I refer to SonarQube's Issues and to to Jira.
> 
> With kind regards,
> 
> George Digkas
> 
> From: Andy Seaborne 
> Sent: Thursday, November 16, 2017 12:55 PM
> To: Γεώργιος Δίγκας; dev@jena.apache.org
> Subject: Re: Issues fixed in Apache Jena
> 
> Do not take git as complete!
> 
> Jena started in 2000.
> https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html
> 
> Jena 2.0 was released 2003-08-28.
> A whole 40M including dependencies! A 14.7M zip file!
> https://sourceforge.net/projects/jena/files/
> 
> The whole of SF SVN history was imported by the Apache infrastructure
> team (a herculean effort) into Apache SVN. I don't know how to get to it
> from git, it may not be there and only in SVN.
> 
> The earliest git root commit is for the move to Apache from SF
> [4298106f1e], 6 years ago. (There are 4 root commits due to merges)
> 
> ---
> 
> It's an interesting start and to make the analysis usefully inform the
> reader as to the state of the project I suggest treating different kinds
> of issues different, not uniformly important.
> 
> There are many (, many) minor things and they outweigh the major
> problems. Calling them all "issues" gives them equal weight. Some are
> about canonicalization of the code.
> 
> Yet reformatting the whole code base (if practical, which it arguable)
> then greatly decreases the usefulness of git history. That would be a
> huge loss.
> 
> (NB the "issue" word has a specific meaning for JIRA which a lot of
> Apache projects use. Jena's current total, now, is 1424.)
> 
> Andy
> 
>> 
>> Thank you in advance!
>> 
>> 
>> With kind regards,
>> 
>> George Digkas



Re: Immutability

2017-11-15 Thread ajs6f

I think this is one reason that Clerezza introduced a new RDF API:

http://clerezza.apache.org/apidocs/org/apache/clerezza/rdf/core/package-summary.html

So it seems to me that if we want to introduce immutable types, we might want to do that in the context of a completely 
new API. The use of the Java 8 Streams API is also something that has been mooted as something that might merit a new 
Jena API. (Instead of mixing things up in the current one.)


I'm not sure how that plays out with ARQ, though. We would want people to be able to use the new types with ARQ without 
much difficulty.



ajs6f

Claude Warren wrote on 11/14/17 2:43 AM:

In most cases I prefer immutable interfaces.  However, immutable interfaces
pose an interesting problem for contract testing and for the permissions
implementation.

In contract testing you get have a producer to create instances of an
interface and tests you run against it.

However, since you don't have any setters to call on the instance you can
not know what the result of any particular getter should be.

The only choices that I see for this case are:

   1. Don't test the immutable interface separately and therefore miss some
   implementations.  That is only test the immutable interface when paired
   with the mutable interface.
   2. Modify the producer interface so that the producer will create the
   data necessary to execute the tests.  This results in complicated producers.

Keeping in mind that contract tests allow us to write tests for the Graph
interface and then create very simple implementations of suites for each
implementation on Graph.  This means that when we add ad method or detect
an incorrectly implemented method we can modify the contract test and all
implementations are then properly tested.

In the permissions implementation we wrap the interfaces with dynamic
proxies that intercept calls to verify if the user has permission to make
those calls before execution, wrap the results with "secured" versions, and
in some case filter results (e.g. iterators).  This system will be
perfectly happy running against immutable interfaces.  The interesting part
is that the system can take mutable objects and return objects that throw
exceptions when the user does not have access (much the same as the current
read only implementations do).  But you can not know *a priori* which
methods will throw exceptions.

This leads me to one more observation. When building the permissions layer
I learned that simple objects like RDFNode can return complex objects like
Model.  I believe that an immutable model  would have to return an
immutable RDFNode. The signature of the Immutable RDFNode should indicate a
return of an Immutable Model.  But to be a drop in replacement for a
standard RDFNode it will need to return a Model.  Classes like RDF lists
also pose interesting problems.

So while I like immutable interfaces in general, I think that back fitting
them here is problematic.  Scan through the permissions layer for some idea
of the complexity.

Having written all of this I think I have come to believe that low level
tools like Jena or data stores in general, should not have immutable
interfaces.  Immutable interfaces belong at a slightly higher architectural
level or at the extreme boundary of the project.  For example if Jena had a
webservice API that retrieved Models and such then it might makes sense for
the deserialized versions to be immutable.

Claude


On Tue, Nov 14, 2017 at 12:12 AM, Adam Jacobs 
wrote:


The subject of immutability was raised in JENA-1391 (
https://issues.apache.org/jira/browse/JENA-1391).

Specifically, the `getUnionModel` method in Jena 3.4 returns an immutable
model view, and the implementation of the aforementioned story includes
methods that will return an immutable dataset view.

The question is whether these immutable views deserve their own
interfaces. Currently, the views are returned using what I called
"unexpected immutability" because they implement mutable interfaces. This
introduces the potential for `UnsupportedOperationException`s.

Unfortunately, that (degenerate) pattern is used in Java's `Collections`
utility as well (https://docs.oracle.com/javase/8/docs/api/java/util/
Collections.html) but Scala is a clean example to draw inspiration from:
by implementing immutable interfaces as parents to their mutable
counterparts (rather than vice verse) we can satisfy the Liskov
Substitution Principle.

Obviously, implementing that solution is easier to do from scratch than in
an existing code base; but I imagine it could be done in multiple phases,
by introducing the new interfaces and using them in new methods (with easy
conversion to mutability via union) while gradually retrofitting older
methods.

The question then, is whether such a change is worthwhile...







Re: Issues fixed in Apache Jena

2017-11-15 Thread ajs6f

It's not really clear to me how to answer these question without more context.

How did you go about making these calculations? What span of time does your analysis concern? What are you counting as 
technical debt (anything that SonarQube claims is "technical debt")? Are you comparing Jena to other projects with a 
similar lifespan? Are you comparing Jena to projects that have a similar contribution history? etc. etc.




ajs6f

 ?? wrote on 11/15/17 10:15 AM:

Dear developers,

I am a PhD student in the university of Groningen and the topic of my PhD is 
the evolution of Technical Debt (TD) in open-source development.
I have analyzed some projects from the Apache Foundation (using SonarQube) and 
I realized that your project has a tremendous number (405,700) of fixed issues, 
when we compare it to other projects from Apache.
I would like to ask you the following 3 questions:

  1.  Why had been introduced so many issues of TD into your project?
  2.  The fixing of those issues was in purpose or a coincidence?
  3.  Do you use SonarQube (or SonarLint) in order to detect and fix the issues?

Thank you in advance!

With kind regards,

George Digkas



[GitHub] jena issue #307: JENA-1418: Upgrade some versions

2017-11-14 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/307
  
👍 the Commons Lang jump is more complicated because of the issues around 
ISO date format.


---


Re: Gitbox?

2017-11-12 Thread ajs6f

Daniel--

We have begun to discuss this on dev@jena, and one question that immediately came up is how this plays with JIRA 
integration. Currently we have a system in which any mention of any extant JIRA ticket in a Github PR starts copying the 
comments in that PR over to that JIRA ticket, which is useful. Can we assume that the same integration will work the 
same way if we go to "Github as canonical"?


Are there any further integrations available after choosing "Github as canonical", e.g. create-JIRA-ticket-on-PR or the 
like?


Thanks for info!

ajs6f

Daniel Pono Takamori wrote on 11/9/17 1:22 PM:

Gitbox would indeed allow you to have the Github tools available to
committers.  It treats Github as the canonical source (we also keep a
copy on Gitbox), which allows the PRs and issues to be a bit more
convenient (there are still some things we can't support due to the
Github's coarse permission structure).
We require all committers to use Github's 2FA [0] so once you have a
taken a vote in the project, file a ticket on the INFRA JIRA [1] and
then your committers can run through the Gitbox syncing [2] to matchup
ASF IDs and Github IDs.
Let us know if you have any other questions.

[0] - https://help.github.com/articles/providing-your-2fa-authentication-code/
[1] - https://issues.apache.org/jira/browse/INFRA
[2] - https://gitbox.apache.org/setup/

On Thu, Nov 9, 2017 at 10:50 AM,   wrote:

Hi, I'm a committer for Jena. Recently, we had some discussion about our
source management and there was some uncertainty about how we can arrange
the relationship between Github and Apache git.

Currently, commits go against Apache git, and Github picks them up and
mirrors them, which is a bit annoying in that it's not possible to use the
Github PR review machinery transparently. This is not a big deal, but is it
in fact possible to do that? In other words, is it possible to (e.g.) merge
PRs at Github and have Apache git pick up the change?

I went to https://gitbox.apache.org/setup/ and linked my accounts, but that
didn't seem to do anything...

Thanks for any info!

--

ajs6f


Re: Gitbox?

2017-11-12 Thread ajs6f

Yes, I'm on this!

First question, what is a [DISCUSS]? :) I assume you are talking about a dev@ thread with that label to discuss this 
possibility, but is there more to it than that?


As I understand it now, "it" is changing from our current setup, in which we act against Apache git and the results are 
mirrored to Github, to the opposite direction, in which we act against Github and the results are mirrored to Apache 
git. As far as PR machinery goes, an advantage that I see is that we will be able to use the complete excellent web UI 
at Github. Right now, we can comment, review, etc, but when it comes time to merge, it occurs via CLI. It's true that 
that isn't the end of the world, but is both clunky (as Andy notes) and error-prone (I had a annoying problem with it on 
a recent PR). We would also get accurate results from Github's visualization tools, which isn't a major thing, but could 
be nice.


My understanding is that such a change will have no effect at all on our current JIRA integration, but I will get that 
confirmed (or disproved!) by INFRA. Going to Github issues would be a different choice, and I am not arguing for that 
now. (Trying to split as much off of this as possible to keep the decision simpler!)


Bruno-- if it's not obvious, I am intentionally splitting off the question of where we maintain our site, which is a 
really good thing to discuss, but think it is orthogonal.


ajs6f

Andy Seaborne wrote on 11/11/17 10:41 AM:



On 09/11/17 18:24, aj...@apache.org wrote:

Great, thanks!

So folks, is there interest in pursuing this rearrangement of our source 
management? I would certainly vote for it.


Great - do you want to take this on and see us through the process?

We'll need a [DISCUSS] I expect to make sure we all know what the VOTE is really about. 
I'm not clear what "it" is exactly.

I'm all linked up - 2FA etc, and it can see which repo I have access to.


I use JIRA search and git history more and more to understand what users are 
asking about and to trace down bugs.

How does gitbox interact with JIRA?  Do we get a nice set of JIRA comment 
(obvious existing bug - JIRA isn't markdown so
`` and {{}} mess up.

I guess discussion is on GH issues, not JIRA tickets, which is a change but 
fine by me.


The "git pull github pull/XXX/head --no-ff ; git push" is clunky but not the 
end of the world.

(Unrelated wish - delete of the local branch - is that "git fetch --prune"?).


it's not possible to use the Github PR review machinery transparently


This confused me - we use GH PR review at the monent. Could you expand the 
point?


So I guess I want to know what changes in the workflows for submission (nothing 
presumably but what about JIRA? Auto
create-JIRA-on-PR would be fancy) and for acceptance

I'm expecting changes if we go "gitbox" and that's fine, I'm not arguing for 
the status quo. It's not easy to reverse so
I want to know what it is.

Andy

https://github.com/apache/jena/blob/master/CONTRIBUTING.md



ajs6f

Daniel Pono Takamori wrote on 11/9/17 1:22 PM:

Gitbox would indeed allow you to have the Github tools available to
committers.  It treats Github as the canonical source (we also keep a
copy on Gitbox), which allows the PRs and issues to be a bit more
convenient (there are still some things we can't support due to the
Github's coarse permission structure).
We require all committers to use Github's 2FA [0] so once you have a
taken a vote in the project, file a ticket on the INFRA JIRA [1] and
then your committers can run through the Gitbox syncing [2] to matchup
ASF IDs and Github IDs.
Let us know if you have any other questions.

[0] - https://help.github.com/articles/providing-your-2fa-authentication-code/
[1] - https://issues.apache.org/jira/browse/INFRA
[2] - https://gitbox.apache.org/setup/

On Thu, Nov 9, 2017 at 10:50 AM,   wrote:

Hi, I'm a committer for Jena. Recently, we had some discussion about our
source management and there was some uncertainty about how we can arrange
the relationship between Github and Apache git.

Currently, commits go against Apache git, and Github picks them up and
mirrors them, which is a bit annoying in that it's not possible to use the
Github PR review machinery transparently. This is not a big deal, but is it
in fact possible to do that? In other words, is it possible to (e.g.) merge
PRs at Github and have Apache git pick up the change?

I went to https://gitbox.apache.org/setup/ and linked my accounts, but that
didn't seem to do anything...

Thanks for any info!

--

ajs6f


Re: Gitbox?

2017-11-09 Thread ajs6f

Great, thanks!

So folks, is there interest in pursuing this rearrangement of our source 
management? I would certainly vote for it.

ajs6f

Daniel Pono Takamori wrote on 11/9/17 1:22 PM:

Gitbox would indeed allow you to have the Github tools available to
committers.  It treats Github as the canonical source (we also keep a
copy on Gitbox), which allows the PRs and issues to be a bit more
convenient (there are still some things we can't support due to the
Github's coarse permission structure).
We require all committers to use Github's 2FA [0] so once you have a
taken a vote in the project, file a ticket on the INFRA JIRA [1] and
then your committers can run through the Gitbox syncing [2] to matchup
ASF IDs and Github IDs.
Let us know if you have any other questions.

[0] - https://help.github.com/articles/providing-your-2fa-authentication-code/
[1] - https://issues.apache.org/jira/browse/INFRA
[2] - https://gitbox.apache.org/setup/

On Thu, Nov 9, 2017 at 10:50 AM,   wrote:

Hi, I'm a committer for Jena. Recently, we had some discussion about our
source management and there was some uncertainty about how we can arrange
the relationship between Github and Apache git.

Currently, commits go against Apache git, and Github picks them up and
mirrors them, which is a bit annoying in that it's not possible to use the
Github PR review machinery transparently. This is not a big deal, but is it
in fact possible to do that? In other words, is it possible to (e.g.) merge
PRs at Github and have Apache git pick up the change?

I went to https://gitbox.apache.org/setup/ and linked my accounts, but that
didn't seem to do anything...

Thanks for any info!

--

ajs6f


[GitHub] jena pull request #304: JENA-1418: Upgrading minor dependencies and plugins

2017-11-04 Thread ajs6f
GitHub user ajs6f opened a pull request:

https://github.com/apache/jena/pull/304

JENA-1418: Upgrading minor dependencies and plugins



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajs6f/jena JENA-1418

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #304


commit 548770743dff4dae6a26ae40c8d1abaf613a9daf
Author: ajs6f 
Date:   2017-11-04T14:49:41Z

Upgrading commons-io and commons-codec

commit 75d17cdbb2b788e5fc270226863cb0dd0a911c83
Author: ajs6f 
Date:   2017-11-04T14:53:20Z

Upgrading commons-csv

commit 589fe44ada8a78985af5389efc2593017b51eb61
Author: ajs6f 
Date:   2017-11-04T15:00:18Z

Upgrading Log4j2

commit dd5996150add112587d4a31fd56dd062be21f160
Author: ajs6f 
Date:   2017-11-04T15:05:17Z

Upgrading contract test dependencies

commit 9f93514b78099076257836e754a63253ab71a3e0
Author: ajs6f 
Date:   2017-11-04T15:22:30Z

Maven plugin updates

commit 6c100d92be885f13c31c2674ead85ba2a7f2b383
Author: ajs6f 
Date:   2017-11-04T15:29:53Z

Moving Shiro dependency management

commit 56f2f6bd7641a408a9821d690d0f961f03ff796d
Author: ajs6f 
Date:   2017-11-04T15:47:09Z

Upgrading Shiro




---


[GitHub] jena pull request #289: Version bumps for 3.5

2017-11-04 Thread ajs6f
Github user ajs6f closed the pull request at:

https://github.com/apache/jena/pull/289


---


[GitHub] jena issue #289: Version bumps for 3.5

2017-11-04 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/289
  
I'm closing this PR because for such a small changeset, it's not worth 
figuring out how my delta gots screwed up with the tabs in the `pom.xml`. I'll 
just open a fresh PR in a day or three, for JENA-1418.


---


[GitHub] jena issue #303: JENA-1408: Quicker -Pdev; simplify profiles.

2017-11-04 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/303
  
What's here looks good, but I don't see where `jena-iri` and 
`jena-shaded-guava` are being added in?


---


Re: [VOTE] Release Apache Jena 3.5.0 (RC2)

2017-10-30 Thread ajs6f

Yeah, I was somehow missing your key. Weird, I thought I had imported it a long 
time ago. Oh, well, all good on that front.

+1 to the release.


ajs6f

Andy Seaborne wrote on 10/30/17 10:34 AM:



On 30/10/17 14:04, aj...@apache.org wrote:

I got a clean build with Mac OS X, Maven 3.5.0, Java version: 1.8.0_65, vendor: 
Oracle Corporation.

However, when checking the sigs, I'm getting:

➜  /tmp gpg --verify apache-jena-3.5.0.tar.gz.asc apache-jena-3.5.0.tar.gz
gpg: Signature made Mon Oct 30 05:47:51 2017 EDT
gpg:using RSA key 04C95136D236A58F
gpg: Can't check signature: No public key

And I can't find a sig with that string in the MIT keyserver... Andy, did you 
change keys recently?


Not for a while.

Search for "seaborne"  and I see

pub  4096R/D236A58F 2016-11-04 Andy Seaborne (Code signing key) 


and link to:

https://pgp.mit.edu/pks/lookup?op=get&search=0x04C95136D236A58F

and the public key is in the KEYS file.

 gpg --import KEYS
 pgp < KEYS

Andy



ajs6f

Osma Suominen wrote on 10/30/17 8:50 AM:

Thanks for preparing the second RC Andy! Excellent work, and very timely, 
considering that several problems were found
with the RC1 build late last week and you got all fixes integrated already!

I tried to build RC2 on two different Ubuntu 16.04 machines. They have slightly 
different Java and Maven versions.


On one machine the build (using "mvn clean install") went fine. Maven 3.3.9, 
Java 1.8.0_131 (OpenJDK).

On the other (Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 
2015-11-10T18:41:47+02:00), Java 1.8.0_151 /
Oracle) I first got this:

[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 07:48 min
[INFO] Finished at: 2017-10-30T13:53:07+02:00
[INFO] Final Memory: 101M/829M
[INFO] 
[ERROR] Failed to execute goal 
com.github.alexcojocaru:elasticsearch-maven-plugin:5.2:runforked 
(start-elasticsearch) on
project jena-text-es: Condition returned by method "waitToStart" in class
com.github.alexcojocaru.mojo.elasticsearch.v2.client.Monitor was not fulfilled 
within 30 seconds. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :jena-text-es
[INFO] Stopping the Elasticsearch process at application shutdown ...
[INFO] ... the Elasticsearch process has stopped. Exit code: 143
[INFO] Elasticsearch [0] stopped with exit code 143


So apparently Elasticsearch didn't start properly for the jena-text-es 
integration tests. I happen to have Elasticsearch
running on this machine but IIRC the tests should use a non-standard TCP port, 
so the two Elasticsearch instances
shouldn't interfer with each other.

I just resumed the build without doing anything else, and it worked the second 
time, so maybe it was just a random
transient error. But then I got this:


[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 02:20 min
[INFO] Finished at: 2017-10-30T14:25:55+02:00
[INFO] Final Memory: 89M/788M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.6:single (create-zip-assembly) 
on
project apache-jena-fuseki: Execution create-zip-assembly of goal
org.apache.maven.plugins:maven-assembly-plugin:2.6:single failed: group id 
'300' is too big ( > 2097151 ). Use STAR
or POSIX extensions to overcome this limit -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :apache-jena-fuseki
[INFO] The Elasticsearch process has already stopped. Nothing to clean up

According to the error above, as well as [1], this can be fixed by using POSIX 
tar format.

I think this happens because on this machine (administered by my employer, 
University of Helsinki, with LDAP
authentication) my user account has a high group ID (300), while the other 
machine is a personal laptop where I've
installed Ubuntu myself and my group ID is 1000.

I can 

Re: [] Release Apache Jena 3.5.0 (RC2)

2017-10-30 Thread ajs6f

I got a clean build with Mac OS X, Maven 3.5.0, Java version: 1.8.0_65, vendor: 
Oracle Corporation.

However, when checking the sigs, I'm getting:

➜  /tmp gpg --verify apache-jena-3.5.0.tar.gz.asc apache-jena-3.5.0.tar.gz
gpg: Signature made Mon Oct 30 05:47:51 2017 EDT
gpg:using RSA key 04C95136D236A58F
gpg: Can't check signature: No public key

And I can't find a sig with that string in the MIT keyserver... Andy, did you 
change keys recently?

ajs6f

Osma Suominen wrote on 10/30/17 8:50 AM:

Thanks for preparing the second RC Andy! Excellent work, and very timely, 
considering that several problems were found
with the RC1 build late last week and you got all fixes integrated already!

I tried to build RC2 on two different Ubuntu 16.04 machines. They have slightly 
different Java and Maven versions.


On one machine the build (using "mvn clean install") went fine. Maven 3.3.9, 
Java 1.8.0_131 (OpenJDK).

On the other (Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 
2015-11-10T18:41:47+02:00), Java 1.8.0_151 /
Oracle) I first got this:

[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 07:48 min
[INFO] Finished at: 2017-10-30T13:53:07+02:00
[INFO] Final Memory: 101M/829M
[INFO] 
[ERROR] Failed to execute goal 
com.github.alexcojocaru:elasticsearch-maven-plugin:5.2:runforked 
(start-elasticsearch) on
project jena-text-es: Condition returned by method "waitToStart" in class
com.github.alexcojocaru.mojo.elasticsearch.v2.client.Monitor was not fulfilled 
within 30 seconds. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :jena-text-es
[INFO] Stopping the Elasticsearch process at application shutdown ...
[INFO] ... the Elasticsearch process has stopped. Exit code: 143
[INFO] Elasticsearch [0] stopped with exit code 143


So apparently Elasticsearch didn't start properly for the jena-text-es 
integration tests. I happen to have Elasticsearch
running on this machine but IIRC the tests should use a non-standard TCP port, 
so the two Elasticsearch instances
shouldn't interfer with each other.

I just resumed the build without doing anything else, and it worked the second 
time, so maybe it was just a random
transient error. But then I got this:


[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 02:20 min
[INFO] Finished at: 2017-10-30T14:25:55+02:00
[INFO] Final Memory: 89M/788M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.6:single (create-zip-assembly) 
on
project apache-jena-fuseki: Execution create-zip-assembly of goal
org.apache.maven.plugins:maven-assembly-plugin:2.6:single failed: group id 
'300' is too big ( > 2097151 ). Use STAR
or POSIX extensions to overcome this limit -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :apache-jena-fuseki
[INFO] The Elasticsearch process has already stopped. Nothing to clean up

According to the error above, as well as [1], this can be fixed by using POSIX 
tar format.

I think this happens because on this machine (administered by my employer, 
University of Helsinki, with LDAP
authentication) my user account has a high group ID (300), while the other 
machine is a personal laptop where I've
installed Ubuntu myself and my group ID is 1000.

I can try to fix this in a PR, but I don't think it's release critical. It's 
probably not a new issue, I just haven't
done a full Jena build on this machine, at least not after it was reinstalled 
some months ago.

-Osma

[1] https://maven.apache.org/plugins/maven-assembly-plugin/faq.html#tarFileModes


Andy Seaborne kirjoitti 30.10.2017 klo 13:20:

Hi,

Here is a vote on a release of Jena 3.5.0.

This is the second proposed candidate for a 3.5.0 release.

Note - the deadline is 18:00 UTC on Thursday - not midnight - so th

Re: Jena 3.5.0 RC2 plan

2017-10-28 Thread ajs6f

Back from Vienna!

Master just built beautifully for me on Mac OSX, from commit 
92c793b67dbb4138858106774d57b23418dd4ae5.


ajs6f

Andy Seaborne wrote on 10/27/17 4:42 PM:

Currently on master:

All the PRs are integrated (minimal version of #297 - tests are much faster, 
with a build taking about 12-13 minutes).

It's run a couple of times on Apache jenkins as well.

If anyone gets the chance to run from master on Windows or OSX, that would be 
great. TestDatabaseOps should be fixed.
TestProcessFileLock on Windows may well be but IMo is not a blocker to a 
release - its a test setup issue.

Andy

On 27/10/17 14:53, Andy Seaborne wrote:

There are test problems with TestProcessFileLock (on windows, consistently) and 
TestDatabaseOps (intermittent but
seems to like picking on Bruno).

There a couple of other small PRs for fixes which look safe as well.

PR #294 AdapterFileManager fix
(Rob - JENA-1405 can be resolved?)
PR #297 Elephas testing speed up
(slight discussion ongoing about details)

TestDatabaseOps:
PR #295 Use @Rule in testing to isolate tests.
PR #296 Control the test in disk-touching DBOE modules.

TestProcessFileLock:
PR #298 Isolate tests witha @Rule and improve/fix lock release.


I don't intend to rebuild the javadoc. As you may have noticed, I had some 
"fun" trying to get it staged. This cutting
corners but no public API is changing. If necessary, it can be done after 
pushing the release out.


Master is at version 3.6.0-SNAPSHOT so it goes back.
(and switching a unified versions number makes that easier)

To smooth the reset, mainly for any of us who have switched to the latest 
snapshot, I'll apply the PR's, we can test
the snapshot, then reset back to 3.5.0-SNAPSHOT just before the RC2 release 
build.

Please don't push to master without also letting dev@ know to make sure I don't 
miss anything.

Good plan? Anything missing?

 Andy



Re: @Test TestDatabaseOps.compact_prefixes_3 [Was Re: [] Release Apache Jena 3.5.0]

2017-10-25 Thread ajs6f

I did notice one warning when running the tests, but most likely unrelated, and 
expected for some test.

09:18:56 WARN  TDB   :: Location 
/home/kinow/Development/java/jena/jena/jena-tdb/target/tdb-testing/DB/ was not 
locked, if another JVM accessed this location simultaneously data corruption 
may have occurred



I see that all the time. I'm ashamed to admit I've never looked too closely into it. It's never errored me out of a 
build or appeared to have any consequence.



ajs6f

Bruno P. Kinoshita wrote on 10/25/17 10:22 PM:

Morning Andy,
I have access to a Windows machine at work where I can quickly start a build 
later this week.
Thanks for looking into it. Decided to reply e-mails and provide more 
information before leaving to the office, so that I had another chance at 
running the tests in this environment where the bugs always happen - i.e. not 
intermittent in my local workstation from what I can tell.

1/ Could you try running the test in isolation? it should run in Eclipse
by pointing at the test and running just that one @Test.


Sorry, I should have included this in the previous e-mail. Running the test in 
isolation in Eclipse works. Running the test in isolation in Maven also works 
(i.e. mvn clean test install -Dtest=TestDatabaseOps -DfailIfNoTests=no).

2/ Run with mvn -fn (--fail-never) which should make maven run the other 
>modules and tests so showing if there are any other problems.

Managed to wait just until this run of `mvn clean test install 
-Dmaven.javadoc.skip -fn` passed the TDB2 project. The issue happened, but I 
couldn't wait for the other tests to run. Gotta rush to the office. I did 
notice one warning when running the tests, but most likely unrelated, and 
expected for some test.

09:18:56 WARN  TDB   :: Location 
/home/kinow/Development/java/jena/jena/jena-tdb/target/tdb-testing/DB/ was not 
locked, if another JVM accessed this location simultaneously data corruption 
may have occurred
Going to build the project at work, and send a stack trace (didn't see one last 
night, but was running the test just before calling a day).
CheersBruno


  From: Andy Seaborne 
 To: dev@jena.apache.org
 Sent: Thursday, 26 October 2017 6:33 AM
 Subject: @Test TestDatabaseOps.compact_prefixes_3 [Was Re: [] Release Apache 
Jena 3.5.0]

Bruno,

The absence of the Data-0001 is very strange. It is created for every
test in the DatabaseMgr.connectDatasetGraph call and other tests work.

I don't have access to OSX or Windows for testing and compaction is
playing around with directories and files on disk (java, portable, ...)
but the code for compaction, and its tests, is quite recent.

I did run a build+test with source-release zip file on Linux.

A few things to try:

1/ Could you try running the test in isolation? it should run in Eclipse
by pointing at the test and running just that one @Test.

Is there a stacktrace?

2/ Run with mvn -fn (--fail-never) which should make maven run the other
modules and tests so showing if there are any other problems.

3/ I can see that parallel tests would mess it up the test setup. I've
just read the surefire plugin docs and I'm still not sure what he
default is - it might be thread per core.  Can you try forcing no
parallel tests please? [*]


I can't see a NPE place in line 142, I'm guessing it comes in the
Txn.executeRead that follows.

g.getPrefixMapping().getNsURIPrefix(

and I'll guess that its the second object access (getPrefixMapping())
but it might be deeper - no stack trace?

I can't see how that connects to the absence of Data-0001.


[*]
Module jena-tdb2 does not set up surefire and replies on defaults. It
should run TC_TDB

  
org.apache.maven.plugins
maven-surefire-report-plugin

  
**/TC_*.java
  

  


Andy


On 25/10/17 12:19, Bruno P. Kinoshita wrote:

I think one of the tests is failing when I run

`mvn clean test install`, and also when I run the same `mvn clean test -e -X 
-DforkMode=never` in debug mode in Eclipse.


The compact_prefixes_3 test method expects a directory like DB/Data-0001, but 
there is only a DB folder. The methods to create the Data-0001 switchable 
location was called, but for some reason nothing happened.

Didn't have much time to thoroughly investigate it, so will have to leave the 
error here for others to take a look. Will have more time to look into it 
tomorrow evening NZ time.
Running org.apache.jena.tdb2.sys.TestDatabaseOps
Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.028 sec <<< 
FAILURE! - in org.apache.jena.tdb2.sys.TestDatabaseOps
compact_prefixes_3(org.apache.jena.tdb2.sys.TestDatabaseOps)  Time elapsed: 0.053 sec  
<<< ERROR!
java.lang.NullPointerException
at 
org.apache.jena.tdb2.sys.TestDatabaseOps.compact_prefixes_3(TestDatabaseOps.java:142)

Running org.apache.jena.tdb2.

Re: [] Release Apache Jena 3.5.0

2017-10-25 Thread ajs6f

That sounds good to me.


ajs6f

Bruno P. Kinoshita wrote on 10/25/17 10:04 PM:

Hi Andy,
I'm suspecting on either a file not being really deleted by the JVM (i.e. IOX 
or Commons IO might be failing to do that), or a hidden bug somewhere else, not 
really in the compaction step.
And +1 on not blocking the release. Perhaps we could simply mark that test with 
a @Ignore + a comment about a known issue, and proceed with the release.
CheersBruno

  From: Andy Seaborne 
 To: dev@jena.apache.org
 Sent: Thursday, 26 October 2017 5:50 AM
 Subject: Re: [] Release Apache Jena 3.5.0

Bruno,

Thank you for running these tests.

== What to do about the 5.3.0 release

TDB2 is marked as experimental and I don't know how else to break the
deadlock of not getting used for real except by a release.  I've
hammered as much as I can.

The absence of Data-0001 suggests it is a test setup/teardown problem,
not a compaction problem. Compaction is currently quite difficult to
access (it isn't available live from Fuseki).

Options:

* Pull 5.3.0 and fix it. Unbounded wait.
* Remove TDB2 etc from 5.3.0.
* (if possible), identity the problem , then release with a note attached.

We can't have one part of Jena blocking the rest - there are still all
the incremental improvements and all the contributions to get out.

It's a compaction test and compaction does not remove data.  The
previous version of the database is accessed read-only (with writers
locked out, but it is a read transaction).

However I'm biased.

Andy

On 25/10/17 12:19, Bruno P. Kinoshita wrote:

I think one of the tests is failing when I run

`mvn clean test install`, and also when I run the same `mvn clean test -e -X 
-DforkMode=never` in debug mode in Eclipse.


The compact_prefixes_3 test method expects a directory like DB/Data-0001, but 
there is only a DB folder. The methods to create the Data-0001 switchable 
location was called, but for some reason nothing happened.

Didn't have much time to thoroughly investigate it, so will have to leave the 
error here for others to take a look. Will have more time to look into it 
tomorrow evening NZ time.
Running org.apache.jena.tdb2.sys.TestDatabaseOps
Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.028 sec <<< 
FAILURE! - in org.apache.jena.tdb2.sys.TestDatabaseOps
compact_prefixes_3(org.apache.jena.tdb2.sys.TestDatabaseOps)  Time elapsed: 0.053 sec  
<<< ERROR!
java.lang.NullPointerException
at 
org.apache.jena.tdb2.sys.TestDatabaseOps.compact_prefixes_3(TestDatabaseOps.java:142)

Running org.apache.jena.tdb2.sys.TestSys
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in 
org.apache.jena.tdb2.sys.TestSys
Running org.apache.jena.tdb2.sys.TestDatabaseConnection
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.103 sec - in 
org.apache.jena.tdb2.sys.TestDatabaseConnection
Running org.apache.jena.tdb2.assembler.TestTDBAssembler
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.449 sec - in 
org.apache.jena.tdb2.assembler.TestTDBAssembler
Running org.apache.jena.tdb2.TestDatabaseMgr
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.088 sec - in 
org.apache.jena.tdb2.TestDatabaseMgr
Running org.apache.jena.tdb2.solver.TestSolverTDB
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.048 sec - in 
org.apache.jena.tdb2.solver.TestSolverTDB
Running org.apache.jena.tdb2.solver.TestStats
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.013 sec - in 
org.apache.jena.tdb2.solver.TestStats

Results :

Tests in error:
TestDatabaseOps.compact_prefixes_3:142 » NullPointer

Tests run: 537, Failures: 0, Errors: 1, Skipped: 7

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Jena - Project .. SUCCESS [  1.902 s]
[INFO] Apache Jena - Shadowed external libraries .. SUCCESS [ 16.081 s]
[INFO] Apache Jena - IRI .. SUCCESS [  5.970 s]
[INFO] Apache Jena - Base Common Environment .. SUCCESS [ 16.289 s]
[INFO] Apache Jena - Core . SUCCESS [01:28 min]
[INFO] Apache Jena - ARQ (SPARQL 1.1 Query Engine)  SUCCESS [01:30 min]
[INFO] Apache Jena - RDF Connection ... SUCCESS [  8.349 s]
[INFO] Apache Jena - TDB (Native Triple Store)  SUCCESS [ 24.995 s]
[INFO] Apache Jena - Database Operation Environment ... SUCCESS [  0.201 s]
[INFO] Apache Jena - DBOE Base  SUCCESS [  9.072 s]
[INFO] Apache Jena - DBOE Transactions  SUCCESS [  7.257 s]
[INFO] Apache Jena - DBOE Indexes . SUCCESS [  4.289 s]
[INFO] Apache Jena - DBOE Index test suite  SUCCESS [  4.663 s]
[INFO] Apache Jena - DBOE Transactional Datastructures  SUCCESS [ 11.756 s]
[INFO] Ap

Re: [] Release Apache Jena 3.5.0

2017-10-25 Thread ajs6f
I don't understand-- I thought the release is _always and only_ the source-- the artifacts are just a convenience we 
supply...?



ajs6f

Andy Seaborne wrote on 10/25/17 9:37 PM:

On 25/10/17 20:05, aj...@apache.org wrote:

Possible option: change the default Maven profile to skip TDB2 for this 3.5.0 
release?

We buy ourselves some time (at least until a potential 3.5.1 with a stabilized 
TDB2 test regime) but we still keep
TDB2 as available as possible.


Thanks for the suggestion - users consume Jena as maven artifacts and these 
match the release.


I'm quite uncomfortable with releasing with the main profile unstable, but I 
also don't want to block release on a
final fix for whatever Bruno has come across, and I also want to get TDB2 out 
there so that people can start to mess
with it (in the good way).

I am still in Vienna, so only 50% on at most, but I will try to reproduce 
Bruno's report.


ajs6f

Andy Seaborne wrote on 10/25/17 6:50 PM:

Bruno,

Thank you for running these tests.

== What to do about the 5.3.0 release

TDB2 is marked as experimental and I don't know how else to break the deadlock 
of not getting used for real except by a
release.  I've hammered as much as I can.

The absence of Data-0001 suggests it is a test setup/teardown problem, not a 
compaction problem. Compaction is currently
quite difficult to access (it isn't available live from Fuseki).

Options:

* Pull 5.3.0 and fix it. Unbounded wait.
* Remove TDB2 etc from 5.3.0.
* (if possible), identity the problem , then release with a note attached.

We can't have one part of Jena blocking the rest - there are still all the 
incremental improvements and all the
contributions to get out.

It's a compaction test and compaction does not remove data.  The previous 
version of the database is accessed read-only
(with writers locked out, but it is a read transaction).

However I'm biased.

Andy

On 25/10/17 12:19, Bruno P. Kinoshita wrote:

I think one of the tests is failing when I run

`mvn clean test install`, and also when I run the same `mvn clean test -e -X 
-DforkMode=never` in debug mode in
Eclipse.


The compact_prefixes_3 test method expects a directory like DB/Data-0001, but 
there is only a DB folder. The methods
to create the Data-0001 switchable location was called, but for some reason 
nothing happened.

Didn't have much time to thoroughly investigate it, so will have to leave the 
error here for others to take a look.
Will have more time to look into it tomorrow evening NZ time.
Running org.apache.jena.tdb2.sys.TestDatabaseOps
Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.028 sec <<< 
FAILURE! - in
org.apache.jena.tdb2.sys.TestDatabaseOps
compact_prefixes_3(org.apache.jena.tdb2.sys.TestDatabaseOps)  Time elapsed: 0.053 sec  
<<< ERROR!
java.lang.NullPointerException
at 
org.apache.jena.tdb2.sys.TestDatabaseOps.compact_prefixes_3(TestDatabaseOps.java:142)

Running org.apache.jena.tdb2.sys.TestSys
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in 
org.apache.jena.tdb2.sys.TestSys
Running org.apache.jena.tdb2.sys.TestDatabaseConnection
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.103 sec - in
org.apache.jena.tdb2.sys.TestDatabaseConnection
Running org.apache.jena.tdb2.assembler.TestTDBAssembler
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.449 sec - in
org.apache.jena.tdb2.assembler.TestTDBAssembler
Running org.apache.jena.tdb2.TestDatabaseMgr
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.088 sec - in 
org.apache.jena.tdb2.TestDatabaseMgr
Running org.apache.jena.tdb2.solver.TestSolverTDB
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.048 sec - in
org.apache.jena.tdb2.solver.TestSolverTDB
Running org.apache.jena.tdb2.solver.TestStats
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.013 sec - in 
org.apache.jena.tdb2.solver.TestStats

Results :

Tests in error:
   TestDatabaseOps.compact_prefixes_3:142 » NullPointer

Tests run: 537, Failures: 0, Errors: 1, Skipped: 7

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Jena - Project .. SUCCESS [  1.902 s]
[INFO] Apache Jena - Shadowed external libraries .. SUCCESS [ 16.081 s]
[INFO] Apache Jena - IRI .. SUCCESS [  5.970 s]
[INFO] Apache Jena - Base Common Environment .. SUCCESS [ 16.289 s]
[INFO] Apache Jena - Core . SUCCESS [01:28 min]
[INFO] Apache Jena - ARQ (SPARQL 1.1 Query Engine)  SUCCESS [01:30 min]
[INFO] Apache Jena - RDF Connection ... SUCCESS [  8.349 s]
[INFO] Apache Jena - TDB (Native Triple Store)  SUCCESS [ 24.995 s]
[INFO] Apache Jena - Database Operation Environment ... SUCCESS [  0.201 s]
[INFO] Apache Jena -

Re: @Test TestDatabaseOps.compact_prefixes_3 [Was Re: [] Release Apache Jena 3.5.0]

2017-10-25 Thread ajs6f

Just checked out the jena-3.5.0-rc1 tag and ran the complete (`mvn clean 
install`) build successfully...

Mac OS 10.12.6

Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)

I didn't see the prob... love these intermittent bugs!

ajs6f

Andy Seaborne wrote on 10/25/17 9:39 PM:

Jena_Development_Test#2766 shows the same test failure. Ubuntu.

The changes for that run were unrelated to TDB2.

So we have an intermittent failure and it suggests it is the test harness (the 
test itself is entirely deterministic).

Jenkins is a bit unwell at the moment and things are very slow - there is a 
jena-core failure that occurs as well in an
area that has changed in a long while.

So Jenkins might trigger the intermittent test situation.

ajs6f - You ran tests for the version bump?
TDB2 is in the -Pdev profile.

Rob - did you run -Pdev?

Andy

On 25/10/17 18:26, Andy Seaborne wrote:

Bruno,

The absence of the Data-0001 is very strange. It is created for every test in 
the DatabaseMgr.connectDatasetGraph call
and other tests work.

I don't have access to OSX or Windows for testing and compaction is playing 
around with directories and files on disk
(java, portable, ...) but the code for compaction, and its tests, is quite 
recent.

I did run a build+test with source-release zip file on Linux.

A few things to try:

1/ Could you try running the test in isolation? it should run in Eclipse by 
pointing at the test and running just that
one @Test.

Is there a stacktrace?

2/ Run with mvn -fn (--fail-never) which should make maven run the other 
modules and tests so showing if there are any
other problems.

3/ I can see that parallel tests would mess it up the test setup. I've just 
read the surefire plugin docs and I'm
still not sure what he default is - it might be thread per core.  Can you try 
forcing no parallel tests please? [*]


I can't see a NPE place in line 142, I'm guessing it comes in the 
Txn.executeRead that follows.

g.getPrefixMapping().getNsURIPrefix(

and I'll guess that its the second object access (getPrefixMapping()) but it 
might be deeper - no stack trace?

I can't see how that connects to the absence of Data-0001.


[*]
Module jena-tdb2 does not set up surefire and replies on defaults. It should 
run TC_TDB

   
 org.apache.maven.plugins
 maven-surefire-report-plugin
 
   
 **/TC_*.java
   
 
   


 Andy


On 25/10/17 12:19, Bruno P. Kinoshita wrote:

I think one of the tests is failing when I run

`mvn clean test install`, and also when I run the same `mvn clean test -e -X 
-DforkMode=never` in debug mode in Eclipse.


The compact_prefixes_3 test method expects a directory like DB/Data-0001, but 
there is only a DB folder. The methods
to create the Data-0001 switchable location was called, but for some reason 
nothing happened.

Didn't have much time to thoroughly investigate it, so will have to leave the 
error here for others to take a look.
Will have more time to look into it tomorrow evening NZ time.
Running org.apache.jena.tdb2.sys.TestDatabaseOps
Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.028 sec <<< 
FAILURE! - in
org.apache.jena.tdb2.sys.TestDatabaseOps
compact_prefixes_3(org.apache.jena.tdb2.sys.TestDatabaseOps)  Time elapsed: 0.053 sec  
<<< ERROR!
java.lang.NullPointerException
at 
org.apache.jena.tdb2.sys.TestDatabaseOps.compact_prefixes_3(TestDatabaseOps.java:142)

Running org.apache.jena.tdb2.sys.TestSys
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in 
org.apache.jena.tdb2.sys.TestSys
Running org.apache.jena.tdb2.sys.TestDatabaseConnection
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.103 sec - in
org.apache.jena.tdb2.sys.TestDatabaseConnection
Running org.apache.jena.tdb2.assembler.TestTDBAssembler
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.449 sec - in
org.apache.jena.tdb2.assembler.TestTDBAssembler
Running org.apache.jena.tdb2.TestDatabaseMgr
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.088 sec - in 
org.apache.jena.tdb2.TestDatabaseMgr
Running org.apache.jena.tdb2.solver.TestSolverTDB
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.048 sec - in 
org.apache.jena.tdb2.solver.TestSolverTDB
Running org.apache.jena.tdb2.solver.TestStats
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.013 sec - in 
org.apache.jena.tdb2.solver.TestStats

Results :

Tests in error:
   TestDatabaseOps.compact_prefixes_3:142 » NullPointer

Tests run: 537, Failures: 0, Errors: 1, Skipped: 7

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Jena - Project .. SUCCESS [  1.902 s]
[INFO] Apache Jena - Shadowed external li

Re: [] Release Apache Jena 3.5.0

2017-10-25 Thread ajs6f

Possible option: change the default Maven profile to skip TDB2 for this 3.5.0 
release?

We buy ourselves some time (at least until a potential 3.5.1 with a stabilized TDB2 test regime) but we still keep TDB2 
as available as possible.


I'm quite uncomfortable with releasing with the main profile unstable, but I also don't want to block release on a final 
fix for whatever Bruno has come across, and I also want to get TDB2 out there so that people can start to mess with it 
(in the good way).


I am still in Vienna, so only 50% on at most, but I will try to reproduce 
Bruno's report.


ajs6f

Andy Seaborne wrote on 10/25/17 6:50 PM:

Bruno,

Thank you for running these tests.

== What to do about the 5.3.0 release

TDB2 is marked as experimental and I don't know how else to break the deadlock 
of not getting used for real except by a
release.  I've hammered as much as I can.

The absence of Data-0001 suggests it is a test setup/teardown problem, not a 
compaction problem. Compaction is currently
quite difficult to access (it isn't available live from Fuseki).

Options:

* Pull 5.3.0 and fix it. Unbounded wait.
* Remove TDB2 etc from 5.3.0.
* (if possible), identity the problem , then release with a note attached.

We can't have one part of Jena blocking the rest - there are still all the 
incremental improvements and all the
contributions to get out.

It's a compaction test and compaction does not remove data.  The previous 
version of the database is accessed read-only
(with writers locked out, but it is a read transaction).

However I'm biased.

Andy

On 25/10/17 12:19, Bruno P. Kinoshita wrote:

I think one of the tests is failing when I run

`mvn clean test install`, and also when I run the same `mvn clean test -e -X 
-DforkMode=never` in debug mode in Eclipse.


The compact_prefixes_3 test method expects a directory like DB/Data-0001, but 
there is only a DB folder. The methods
to create the Data-0001 switchable location was called, but for some reason 
nothing happened.

Didn't have much time to thoroughly investigate it, so will have to leave the 
error here for others to take a look.
Will have more time to look into it tomorrow evening NZ time.
Running org.apache.jena.tdb2.sys.TestDatabaseOps
Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.028 sec <<< 
FAILURE! - in
org.apache.jena.tdb2.sys.TestDatabaseOps
compact_prefixes_3(org.apache.jena.tdb2.sys.TestDatabaseOps)  Time elapsed: 0.053 sec  
<<< ERROR!
java.lang.NullPointerException
at 
org.apache.jena.tdb2.sys.TestDatabaseOps.compact_prefixes_3(TestDatabaseOps.java:142)

Running org.apache.jena.tdb2.sys.TestSys
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in 
org.apache.jena.tdb2.sys.TestSys
Running org.apache.jena.tdb2.sys.TestDatabaseConnection
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.103 sec - in
org.apache.jena.tdb2.sys.TestDatabaseConnection
Running org.apache.jena.tdb2.assembler.TestTDBAssembler
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.449 sec - in
org.apache.jena.tdb2.assembler.TestTDBAssembler
Running org.apache.jena.tdb2.TestDatabaseMgr
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.088 sec - in 
org.apache.jena.tdb2.TestDatabaseMgr
Running org.apache.jena.tdb2.solver.TestSolverTDB
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.048 sec - in 
org.apache.jena.tdb2.solver.TestSolverTDB
Running org.apache.jena.tdb2.solver.TestStats
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.013 sec - in 
org.apache.jena.tdb2.solver.TestStats

Results :

Tests in error:
   TestDatabaseOps.compact_prefixes_3:142 » NullPointer

Tests run: 537, Failures: 0, Errors: 1, Skipped: 7

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Jena - Project .. SUCCESS [  1.902 s]
[INFO] Apache Jena - Shadowed external libraries .. SUCCESS [ 16.081 s]
[INFO] Apache Jena - IRI .. SUCCESS [  5.970 s]
[INFO] Apache Jena - Base Common Environment .. SUCCESS [ 16.289 s]
[INFO] Apache Jena - Core . SUCCESS [01:28 min]
[INFO] Apache Jena - ARQ (SPARQL 1.1 Query Engine)  SUCCESS [01:30 min]
[INFO] Apache Jena - RDF Connection ... SUCCESS [  8.349 s]
[INFO] Apache Jena - TDB (Native Triple Store)  SUCCESS [ 24.995 s]
[INFO] Apache Jena - Database Operation Environment ... SUCCESS [  0.201 s]
[INFO] Apache Jena - DBOE Base  SUCCESS [  9.072 s]
[INFO] Apache Jena - DBOE Transactions  SUCCESS [  7.257 s]
[INFO] Apache Jena - DBOE Indexes . SUCCESS [  4.289 s]
[INFO] Apache Jena - DBOE Index test suite  SUCCESS [  4.663 s]
[INFO] Apache Jena - DBOE

more benchmarking

2017-10-25 Thread ajs6f

https://iswc2017.semanticweb.org/paper-70/

is a paper in the main conference track at ISWC. It is exercising Jena 2.3.0, but I'm not sure if that version number is 
for Jena/TDB or Fuseki. It is using Java 7, which makes me think they mean Fuseki 2.3.0 (Jena/TDB 3.0.0), which is still 
considerably out of date. Same story, different day...


--

ajs6f


[GitHub] jena pull request #294: Fix and tests for possible NPE (JENA-1405)

2017-10-25 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/294#discussion_r146827609
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/riot/adapters/AdapterFileManager.java ---
@@ -285,6 +286,12 @@ protected Model readModelWorker(Model model, String 
filenameOrURI, String baseUR
 if ( baseURI == null )
 baseURI = SysRIOT.chooseBaseIRI(filenameOrURI) ;
 try(TypedInputStream in = 
streamManager.openNoMapOrNull(mappedURI)) {
+if ( in == null )
+{
+if ( log.isDebugEnabled() )
+log.debug("Failed to locate '"+mappedURI+"'") ;
--- End diff --

As I understand it, `log.debug("Failed to locate '{}'", mappedURI) ` 
[avoids the need](https://www.slf4j.org/faq.html#logging_performance) to 
explicitly check `isDebugEnabled()`.


---


Re: github stuff Was: [2/2] jena git commit: JENA-1391: adding isEmpty method to Dataset

2017-10-24 Thread ajs6f
Super +1 to going to gitpubsub. I am agnostic (because I don't know enough to have a very informed opinion) about site 
processing tools. (I've had no problem using plain ol' Maven Site processing, but I never used it on very large 
projects, nothing with a site the size of Jena's.)


I had not heard of Jekyll, so I went and looked at https://jekyllrb.com/ but it appears to be a Ruby product? Would we 
run it somehow via JRuby from within Maven? Or as an exec task?



ajs6f

Bruno P. Kinoshita wrote on 10/23/17 5:11 AM:

Is there a git+CMS option? (or mirro git to SVN then ...)



More or less. It was enabled in 2015 
https://blogs.apache.org/infra/entry/git_based_websites_available

You must have the web site in the asf-site branch.
No mirror to SVN as far as I know...


CMS may not be around forever, and while the markdown isn't all >standard, it's quite 
close.  (e.g.  the processor "Title:" stuff)


OpenNLP had a few issues migrating to JBake, but a bit of IDE-fu + regex did 
the trick. The difference with Jena, I think, is that Jena's documentation is 
much more extensive, which means more pages to edit. But doable nevertheless.


I don't know what processor is behind CMS - python based? Home
grown/modified?


Home grown, like the old IRC bot/factoid code (written in Lua I think), the 
Help Wanted app (Lua too I believe). The CMS is a mix of Perl and Python.

Source code for the curious: https://svn.apache.org/repos/infra/websites/cms/


I've use jekyll (choice based on choosing a commonly used system to >increase 
the longevity and stability of the choice).


Jekyll is my preferred option as well, as there are heaps more documentation / 
examples / templates to re-use.

The work for OpenNLP is done in https://github.com/apache/opennlp-site, in the 
master branch. Then, there is a job somewhere, set up by ASF Infra, that pulls 
the master branch, runs `mvn clean package ...`, and then deploys the resulting 
static files generated onto the asf-site branch.

The site is served with the contents of that asf-site branch. One can build 
locally pull requests with `mvn clean package ...` and preview the web site, 
without having to wait and preview in the staging web site. Or even preview in 
GitHub pages as well.

The decision for JBake was already done when I joined, so I just helped with a 
few issues. It works quite well though. But if ASF Infra is able to build a 
Jekyll project, then we could use it instead.

Bruno




From: Andy Seaborne 
To: dev@jena.apache.org
Sent: Monday, 23 October 2017 3:09 AM
Subject: Re: github stuff Was: [2/2] jena git commit: JENA-1391: adding isEmpty 
method to Dataset



Is there a git+CMS option? (or mirro git to SVN then ...)

CMS may not be around forever, and while the markdown isn't all
standard, it's quite close.  (e.g.  the processor "Title:" stuff)

I don't know what processor is behind CMS - python based? Home
grown/modified?

I've use jekyll (choice based on choosing a commonly used system to
increase the longevity and stability of the choice).

Andy


On 16/10/17 19:51, Bruno P. Kinoshita wrote:

A few months ago I helped setting up OpenNLP's new website building from github.

The source is in an opennlp-site git repository, and is built with maven using 
the maven jbake static site generator.

Before they were using the svn cms pubsub if I recall correctly. Maybe we could 
have something similar if others like this approach.

CheersBruno

Sent from Yahoo Mail on Android

   On Tue, 17 Oct 2017 at 2:12, aj...@apache.org wrote:
Andy Seaborne wrote on 10/13/17 3:40 PM:

If anyone is interesting in following it up, I have read that Apache projects 
can now use gitbox where by all work is on
Github, including the full PR cycle, and the ASF is mirrored back.  To us, it 
looks like the GH is the master and ASF
the mirror (IIRC its a bit more complicated under the hood for INFRA than that).

 Andy


That sounds good to me. Is this the sort of thing for which I could just file a 
ticket on INFRA and follow up with them?

As long as we are digressing, you know what I would really love? Being able to 
do our docs/site in git/github. I'm
pretty sure other Apache projects manage to do that...

ajs6f




GitBox Was: github stuff

2017-10-24 Thread ajs6f
It's been surprisingly hard for me to find docs about how GitHub vs. Apache-side git works, but that may be my problem! 
:grin: I have found:


https://gitbox.apache.org/setup/

and linked my accounts. But I see no change in behavior-- I cannot push a test 
branch to Github.

So perhaps we (Jena) need to do something to enable this?

ajs6f


Andy Seaborne wrote on 10/22/17 4:02 PM:



On 16/10/17 14:12, aj...@apache.org wrote:


Andy Seaborne wrote on 10/13/17 3:40 PM:

If anyone is interesting in following it up, I have read that Apache projects 
can now use gitbox where by all work is on
Github, including the full PR cycle, and the ASF is mirrored back.  To us, it 
looks like the GH is the master and ASF
the mirror (IIRC its a bit more complicated under the hood for INFRA than that).

Andy


That sounds good to me. Is this the sort of thing for which I could just file a 
ticket on INFRA and follow up with them?


Step one is investigate - I've just seen it mentioned and don't know the 
details of what it does, aand what it does at
ASF.  Do you want to find out?

Andy


...


Re: Release Jena 3.5.0?

2017-10-23 Thread ajs6f

I have no problem with this plan, but just to check:

Andy, I believe you are thinking that the "obvious ones" are the ones that required no code changes and elicited no 
comments from you? Because if you want to include them, I can make a new PR quickly with just them (assuming I can fix 
whatever weird formatting thinking blew up that first PR).



ajs6f

Andy Seaborne wrote on 10/23/17 2:39 PM:

All being well (usual caveats about "things" happening), I'll do the release in 
the next few days.

I hope to include:

JENA-1403: Tidy up regex pattern handling.
#292

Bad regular expression patterns should throw ExprEvalException
#291

Spell checking some Javadocs
#290

The version bump PR#289 has some obvious ones to do and some things that need 
looking at.  But the small deltas are
obscured by an accidental reformat. As none are for bug fixes, I think that can 
wait. Close tracking of dependencies is
usually more helpful that occasional jumps.

If anything else low-risk turns up, I'll try to get it in - PRs work really 
well for this and there is no need to pause
anything. I'll see direct commits to master but please only for trivial, 
zero-risk things.

Andy

On 23/10/17 03:52, Bruno P. Kinoshita wrote:

Didn't have much time to contribute lately, so decided to spend some time 
during a Monday holiday here and spell check
javadocs & site.
Javadocs updates were done in https://github.com/apache/jena/pull/290
Site updates were done in r1812967.
Didn't find anything to fix looking at the templates in Fuseki2 web app. Might 
have more time to review JIRA and see
if there is any small issue for the web site. And I have a branch somewhere in 
one of my workstations with some tests,
to increase test coverage a bit.
Thanks for preparing 3.5.0!!!
Bruno

   From: Andy Seaborne 
  To: "dev@jena.apache.org" 
  Sent: Tuesday, 17 October 2017 11:32 AM
  Subject: Release Jena 3.5.0?
The tick is approaching.
Are we ready to go? JIRA to be marked resolved?

If so, I'll sort out a release soon.

 Andy

Here's a list of changes of note that I gathered:

 Release changes

Introducing TDB2:
http://jena.staging.apache.org/documentation/tdb2/

*TDB2 is not compatible with TDB1*

Compared to TDB1:
* No size limits on transactions : bulk uploads into a live Fuseki
   can e 100's of millions of triples.
* Models and Graphs can be passed across transactions
* No queue of delayed updates, no transaction backlog problems.
* "Writer pays" - readers don't
   All work for update is done on the writer thread.
* Datatypes of numerics preserved; xsd:doubles supported.

TDB2 is subject to change.

We solicit any and all feedback (good and bad!) about TDB2 to help
advance it to deployment-ready.

JENA-1390 : Add StmtIterator.toModel :

JENA-1392 : Add dynamic dataset support to SDB.

JENA-1395 : "--output RDF/XML" now prints using the basic block-oriented
writer, which uses less memory.  Use "--formatted" (same as "--pretty")
for pretty printed RDF/XML.

JENA-1398 :
Upgrade FOAF to add new spelling and deprecation of old for archaic FOAF
properties

== Dependency changes:

No license changes.

Upgrade jsonld-java to 0.11
   jackson to 2.9.0
   commons-fileuploader to 1.3.2->1.3.3
   commons-io 2.5 in jena-base
 (was pulled in anyway by jsonld-java)





Re: At ISWC in Vienna

2017-10-23 Thread ajs6f

Hi, Jean-Marc,

I am here in Vienna as well. I'm not sure if any other committers/PMC members are here, but I would be happy to meet 
about any Jena stuff you want to talk about.



---
A. Soroka
Research Computing : Office of the CIO : the Smithsonian Institution


Jean-Marc Vanel wrote on 10/23/17 9:58 AM:

Hi

I'm at ISWC in Vienna until wednesday.
If you also are there we could meet to chat.
( better answer privately )



[GitHub] jena pull request #289: Version bumps for 3.5

2017-10-21 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/289#discussion_r146109452
  
--- Diff: jena-project/pom.xml ---
@@ -1,867 +1,828 @@
 
-
-
-http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
-  4.0.0
-
-  org.apache.jena
-  jena-project
-  pom
-  http://jena.apache.org/
-  3.5.0-SNAPSHOT
-  Apache Jena - Project
-
-  
-org.apache
-apache
-
-18
-
-  
-
-  
-
-  The Apache Software License, Version 2.0
-  http://www.apache.org/licenses/LICENSE-2.0.txt
-
-  
-  
-  
-The Apache Software Foundation
-http://www.apache.org/
-  
-
-  
-1.7.25
-1.2.17
-4.12
-2.11.0
-0.9.3
-
-
-0.11.1
-2.9.0
-
-2.5
-1.4
-
-3.4
-1.4
-0.7
-
-4.5.3
-4.4.6
-
-${ver.httpcore}
-${ver.httpclient}
-
-1.10
-6.4.1
-
-5.2.2
-
-2.7
-
-0.6
-
-1.9.5
-1.7.0
-
-1.8
-${jdk.version} 
-
-UTF-8
-
-MM-dd'T'HH:mm:ssZ
-0.1.5
-  
-
-  
-
-  
-  doclint-java8-disable
-  
-[1.8,)
-  
-
-  
-
-  
-org.apache.maven.plugins
-maven-javadoc-plugin
-
-  -Xdoclint:none
-
-  
-
-  
-
-  
-
-  
-  
-
-  
-junit
-junit
-${ver.junit}
-test
-  
-
-  
-xerces
-xercesImpl
-${ver.xerces}
-  
-
-  
-org.apache.httpcomponents
-httpclient-cache
-${ver.httpclient}
-
-  
-  
-commons-logging
-commons-logging
-  
-
-  
-
-  
-org.apache.httpcomponents
-httpclient
-${ver.httpclient}
-
-  
-  
-commons-logging
-commons-logging
-  
-
-  
-
-  
-commons-codec
-commons-codec
-${ver.commons-codec}
-  
-  
-  
-commons-io
-commons-io
-${ver.commonsio}
-  
-  
-  
-org.apache.thrift
-libthrift
-${ver.libthrift}
-
-  
-  
-org.apache.httpcomponents
-httpcore
-  
-  
-org.apache.commons
-commons-lang3
-  
-
-  
-
-  
-org.apache.commons
-commons-csv
-${ver.commonscsv}
-  
-
-  
-org.apache.commons
-commons-lang3
-${ver.commonslang3}
-  
-
-  
-commons-fileupload
-commons-fileupload
-1.3.3
-  
-
-  
-org.apache.commons
-commons-collections4
-4.1
-  
-  
-  
-  
-com.github.andrewoma.dexx
-collection
-${ver.dexxcollection}
-  
-  
-  
-com.github.jsonld-java
-jsonld-java
-${ver.jsonldjava}
-
-  
-commons-logging
-commons-logging
-  
-  
-  
-org.apache.httpcomponents
-httpclient-cache
-  
-  
-org.apache.httpcomponents
-httpclient
-  
-  
-org.apache.httpcomponents
-httpclient-osgi
-  
-  
-org.apache.httpcomponents
-httpcore-osgi
-  
-  
-org.slf4j
-slf4j-api
-  
-
-  
-  
-  
-  
-org.apache.lucene
-lucene-core
-${ver.lucene}
-jar
-  
-
-  
-org.apache.lucene
-lucene-analyzers-common
-${ver.lucene}
-  
-
-  
-org.apache.lucene
-lucene

[GitHub] jena issue #289: Version bumps for 3.5

2017-10-21 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/289
  
Wait, something has gone wrong here. The only changes I made were to the 
values of a few properties in that `pom.xml`. I have no idea why it's doing 
such a giant diff. I need to figure that out.


---


[GitHub] jena issue #289: Version bumps for 3.5

2017-10-21 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/289
  
Okay, let's let it hang until after the release then. And I'll file a Jira.


---


Re: Property Paths benchmark @ ISWC2017

2017-10-21 Thread ajs6f
I think Rob's suggested message is pretty reasonable. I think what we can do in this situation is to help open a larger 
conversation about what is fair and what is desirable for this kind of research.


ajs6f

Andy Seaborne wrote on 10/20/17 5:30 PM:



On 20/10/17 11:13, Rob Vesse wrote:


On 20/10/2017 15:56, "Andy Seaborne"  wrote:

 Given this, references to the 2015 are spurious and misleading.

  If you read the original bachelors thesis that Marco referenced [1] the 
equivalent text and the footnote is as follows:

3 https://jena.apache.org/ retrieved at 13.12.2015

Which would indeed be Jena 3.0.1, so the original research was started in 
December 2015 and completed sometime between
then and July 2016 when that thesis was submitted.


I'm not disputing that at all - but the average reader will read the paper and 
that's what it claims.  Clearly its wrong
because we look harder; others may take it at face value.



I would guess that when it was reformatted into a workshop paper they simply 
checked that all the URLs still worked
and updated the footnotes accordingly

  Maybe we are just splitting hairs and expecting too much, it just frustrates 
me when someone discovers a problem and
makes no effort to resolve it


+1



Rob

[1]
https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf









[GitHub] jena pull request #289: Version bumps for 3.5

2017-10-21 Thread ajs6f
GitHub user ajs6f opened a pull request:

https://github.com/apache/jena/pull/289

Version bumps for 3.5

There are 5 commits here, the first 4 of which are (I think) 
non-controversial. In the last, to get from Commons Lang 3.4 to 3.5 (and thence 
to 3.6) I had to change test code.

I think the changes are kosher-- using `Z` instead of `+00:00` is legit 
according to the [XSD Dataypes 
spec](https://www.w3.org/TR/xmlschema11-2/#nt-tzFrag). But it might cause some 
surprises.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajs6f/jena VersionBumpsFor3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #289


commit 5f597bc598635a2553ca12a70a9d4e25a1b76246
Author: ajs6f 
Date:   2017-10-21T07:23:27Z

Bump contract test machinery versions

commit a0cf3320242dc1f456cd3e791760c16ff329d7a1
Author: ajs6f 
Date:   2017-10-21T07:32:38Z

Bump log4j2 version

commit 0a1a0caa637f14c7b37a15ed96971611af90a30b
Author: ajs6f 
Date:   2017-10-21T07:41:23Z

Commons lib version bumps

commit 0f82c01e1b71c136c1e1a5677e1ec58a207f23d6
Author: ajs6f 
Date:   2017-10-21T07:50:01Z

Bump Thrift version

commit 9d05531d6690162f460390e5427f406cf1ac415c
Author: ajs6f 
Date:   2017-10-21T09:52:58Z

Bumping Commons Lang 3.4 -> 3.6




---


[GitHub] jena issue #37: JENA-732 jena-maven-tools outputs to target/generated-source...

2017-10-21 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/37
  
@stain Just pinging again-- is this PR still in flight, or does it still 
make sense?


---


Re: Java 9 branch? (Was: Release Jena 3.5.0?)

2017-10-20 Thread ajs6f

Who manages the ASF parent pom.xml? INFRA? Perhaps we can help move it forward?


ajs6f

Claude Warren wrote on 10/20/17 11:18 PM:

Not really an immediate need so much as just wondering how close our code
is to working under Java 9.  I think it would also be nice to know when the
various tools we use are Java 9 ready and perhaps lend them a hand if need
be.  More curiosity than anything else.

Claude

On Fri, Oct 20, 2017 at 3:47 PM, Andy Seaborne  wrote:


Claude - you can see branches that exist via the GH interface.

And, no, theer isn't one.

There is a jenkins job - it does not work, waiting in updates to roll
through.  Taking over version mgt of the plugins from the ASF parent seems
to me like extra work for little benefit.

Unless there is an immediate need?

Andy


On 19/10/17 08:19, Claude Warren wrote:


Did we get a Java 9 branch started?  Seems like most of the issues are
around tooling not functionality of the product.  If this is the case I
would expect the differences between the java9 branch and the master to be
contained in the pom.xml files.

On Wed, Oct 18, 2017 at 2:20 AM, Andy Seaborne  wrote:

That would be good to see.


Personally, I think that ways to use modules in term good practice and
patterns, and also frameworks, in the java ecosystem will emerge but
anything we can do to reduce barriers seems like a good thing.

On Java9 generally:

The build itself doesn't work with Java9 because it needs updated
versions
of some plugins, and those are inherited from the Apache parent POM. To
take over the version control and override the std settings just seems
like
much work to get ahead by a short period of time.

I'm assuming we stay on java8 as the requirement for applications for a
while yet.

 Andy

On 17 October 2017 at 11:20, Aaron Coburn 
wrote:

Would it make sense to add an Automatic-Module-Name header to the



manifest


files so that Jena is easier to use in a JDK9 context?

I could even volunteer to do this.

Aaron


On Oct 17, 2017, at 9:56 AM, aj...@apache.org wrote:


Claude--

I see some updates available for the contract test machinery:

org.xenei:contract-test-maven-plugin .. 0.1.5 -> 0.1.7
org.xenei:junit-contracts . 0.1.5 -> 0.1.7

Worth doing before a release?


ajs6f

Andy Seaborne wrote on 10/16/17 6:32 PM:


The tick is approaching.
Are we ready to go? JIRA to be marked resolved?

If so, I'll sort out a release soon.

   Andy

Here's a list of changes of note that I gathered:

 Release changes

Introducing TDB2:
http://jena.staging.apache.org/documentation/tdb2/

*TDB2 is not compatible with TDB1*

Compared to TDB1:
* No size limits on transactions : bulk uploads into a live Fuseki
  can e 100's of millions of triples.
* Models and Graphs can be passed across transactions
* No queue of delayed updates, no transaction backlog problems.
* "Writer pays" - readers don't
  All work for update is done on the writer thread.
* Datatypes of numerics preserved; xsd:doubles supported.

TDB2 is subject to change.

We solicit any and all feedback (good and bad!) about TDB2 to help
advance it to deployment-ready.

JENA-1390 : Add StmtIterator.toModel :

JENA-1392 : Add dynamic dataset support to SDB.

JENA-1395 : "--output RDF/XML" now prints using the basic


block-oriented



writer, which uses less memory.  Use "--formatted" (same as



"--pretty")



for pretty printed RDF/XML.


JENA-1398 :
Upgrade FOAF to add new spelling and deprecation of old for archaic


FOAF



properties


== Dependency changes:

No license changes.

Upgrade jsonld-java to 0.11
  jackson to 2.9.0
  commons-fileuploader to 1.3.2->1.3.3
  commons-io 2.5 in jena-base
(was pulled in anyway by jsonld-java)
















Re: Property Paths benchmark @ ISWC2017

2017-10-20 Thread ajs6f

Perhaps the first line of work could be to contact the authors and ask them:

Did you contact Jena (or for that matter, any of the other projects) for this work? Why did you use such an old version 
of Jena?


Would you be willing to try again with a modern version? If the results are significantly different (as they almost 
certainly will be) would you be willing to make an emendation for your workshop paper?



ajs6f

Marco Neumann wrote on 10/19/17 12:10 PM:

just on a side note since this is "only" a workshop contribution it
will not make an appearance in the conference itself and will not
appear in the main ISWC  2017 conference proceedings published by
Springer but only as an independent publication of the workshop
itself.

responsibility for the workshop sits with the  Organising Committee

Axel-Cyrille Ngonga Ngomo, Institute for Applied Informatics, Leipzig, Germany
Anastasia Krithara, National Center for Scienti c Research
“Demokritos”, Athens, Greece
Irini Fundulaki, ICS-FORTH, Heraklion, Crete, Greece

and for review the Program Committee

Milos Jovanovik, OpenLink Software, United Kingdom
Pavlos Fafalios, University of Hannover. Germany
Kostas Stefanidis, University of Tampere, Finland
Muhammad Saleem, AKSW, University of Leipzig, Germany
Manolis Terrovitis, IMIS, RC Athena, Greece
Ricardo Usbeck, University of Leipzig, Germany
George Papastefanatos, IMIS RC Athena, Greece
Stasinos Kostantopoulos, NCSR Demokritos, Greece




On Thu, Oct 19, 2017 at 3:51 PM,   wrote:

I hadn't intended to spend time at the benchmarking sessions at ISWC, but if
it seems useful, I can try and raise this issue in person. I suppose partly
it's a question of setting the record straight, and then partly it's a
question of standing up for good practice, and then it's also a question of
protecting Jena from unmerited negative consequences.

I don't know how widely used such benchmarks are. Except for a few
high-profile projects, I rarely see anyone refer to this sort of evidence as
a reason to or not to adopt a system.


ajs6f

Marco Neumann wrote on 10/19/17 9:26 AM:


Rob,

unfortunately this is more common in Semantic Web research papers than
one might expect. I have seen this before in particular with regards
to perceived shortcomings of jena or its components. It might be a
good idea to bring this to the attention of affiliated people in the
organisation (here University of Southampton and Koblenz-Landau ).

while I don't think this is an intentional attempt to bring Jena into
disrepute the situation could be clarified and addressed by the ISWC
workshop or track chair as well. I wish your mentioned "standard
Industry and research practice" would be more common than it currently
is.

btw the thesis report is dated Juli 2016



On Thu, Oct 19, 2017 at 12:08 PM, Rob Vesse  wrote:


Marco

I don’t believe anyone has tried to contact them yet

I think that the complaints here are that there doesn’t appear to have
been any attempt to report the issues identified back to the projects
studied. If this was a security flaw in the project the standard Industry
and research practice would be to make a responsible disclosure to the
projects in advance of the public disclosure such that the researchers and
projects can work together to resolve the problem. The implication being
that it is irresponsible for the authors to benefit from pointing out flaws
in the projects while appearing to make no efforts to help report/resolve
those issues.

As you suggest this paper does appear to be based upon some thesis work,
that thesis indicates that the research was originally carried out in 2015
implying that the author knew of the issue two years ago.

The project has a relatively small core of developers most of whom work
on Jena on the side. We very much rely upon the wider community to provide
input on bugs that need to be resolved e.g. Performance issues and the
features we should prioritise. When someone clearly knew of a problem but
didn’t tell us that is inevitably frustrating for the project.

Rob

On 19/10/2017 10:08, "Marco Neumann"  wrote:

did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
to get a response?

the findings seem to based on work that has been published online as
part of a bachelor’s thesis by Adrian Skubella.


https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf



On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B.
 wrote:
> For me this is really bad practice. It also looks like they did the
> benchmark more than one year ago. Otherwise due to JENA-1195 this
error
> wouldn't occur anymore. And submission deadline was August 6th,
2017 .
> Their experiments contain 8 queries, rerunning those shouldn't take
ages...
>
> I'm currently trying to reproduce the results of 

Re: Property Paths benchmark @ ISWC2017

2017-10-19 Thread ajs6f
I hadn't intended to spend time at the benchmarking sessions at ISWC, but if it seems useful, I can try and raise this 
issue in person. I suppose partly it's a question of setting the record straight, and then partly it's a question of 
standing up for good practice, and then it's also a question of protecting Jena from unmerited negative consequences.


I don't know how widely used such benchmarks are. Except for a few high-profile projects, I rarely see anyone refer to 
this sort of evidence as a reason to or not to adopt a system.



ajs6f

Marco Neumann wrote on 10/19/17 9:26 AM:

Rob,

unfortunately this is more common in Semantic Web research papers than
one might expect. I have seen this before in particular with regards
to perceived shortcomings of jena or its components. It might be a
good idea to bring this to the attention of affiliated people in the
organisation (here University of Southampton and Koblenz-Landau ).

while I don't think this is an intentional attempt to bring Jena into
disrepute the situation could be clarified and addressed by the ISWC
workshop or track chair as well. I wish your mentioned "standard
Industry and research practice" would be more common than it currently
is.

btw the thesis report is dated Juli 2016



On Thu, Oct 19, 2017 at 12:08 PM, Rob Vesse  wrote:

Marco

I don’t believe anyone has tried to contact them yet

I think that the complaints here are that there doesn’t appear to have been any 
attempt to report the issues identified back to the projects studied. If this 
was a security flaw in the project the standard Industry and research practice 
would be to make a responsible disclosure to the projects in advance of the 
public disclosure such that the researchers and projects can work together to 
resolve the problem. The implication being that it is irresponsible for the 
authors to benefit from pointing out flaws in the projects while appearing to 
make no efforts to help report/resolve those issues.

As you suggest this paper does appear to be based upon some thesis work, that 
thesis indicates that the research was originally carried out in 2015 implying 
that the author knew of the issue two years ago.

The project has a relatively small core of developers most of whom work on Jena 
on the side. We very much rely upon the wider community to provide input on 
bugs that need to be resolved e.g. Performance issues and the features we 
should prioritise. When someone clearly knew of a problem but didn’t tell us 
that is inevitably frustrating for the project.

Rob

On 19/10/2017 10:08, "Marco Neumann"  wrote:

did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
to get a response?

the findings seem to based on work that has been published online as
part of a bachelor’s thesis by Adrian Skubella.


https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf



On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B.  
wrote:
> For me this is really bad practice. It also looks like they did the
> benchmark more than one year ago. Otherwise due to JENA-1195 this error
> wouldn't occur anymore. And submission deadline was August 6th, 2017 .
> Their experiments contain 8 queries, rerunning those shouldn't take 
ages...
>
> I'm currently trying to reproduce the results of the paper, but the
> whole experimental setup remains unclear. I'm wondering if they used
> just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because
> the runtimes in the eval section are quite small, but even loading the
> data of their benchmark takes much more time. So maybe they used the
> RDF4J server.
>
> The worst thing is that they didn't contact any of the developers. Or
> did they talk to somebody here and then Andy created the ticket
> JENA-1195? Also for the other queries that failed, I would expect to see
> tickets on Apache JIRA or at least a hint on the Jena mailing list...
>
> @Andy I'm also wondering whether JENA-1317 addresses the problem with
> the empty result of benchmark query containing an inverse property path.
>
>
> On 18.10.2017 17:03, aj...@apache.org wrote:
>> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole
>> them and give them our POV? :grin:
>>
>> In all seriousness, from what I can tell the results amount to "Using
>> older versions of our comparands and without contacting the projects
>> in question we couldn't find a store that implements every property
>> path feature correctly and some fail entirely."
>>
>> I'm not really sure how useful that information

Re: Property Paths benchmark @ ISWC2017

2017-10-18 Thread ajs6f

As you know, Andy, I'm going to ISWC this year-- shall I buttonhole them and 
give them our POV? :grin:

In all seriousness, from what I can tell the results amount to "Using older versions of our comparands and without 
contacting the projects in question we couldn't find a store that implements every property path feature correctly and 
some fail entirely."


I'm not really sure how useful that information is...? But I am ready to do a benchmarking paper for next year. Seems 
like it's a lot easier than I thought!



ajs6f


Andy Seaborne wrote on 10/17/17 9:28 AM:

Hi Lorenz,

Looks like JENA-1195 which is fixed.  Does that look like it?

I think it is shame when papers focus on bugs rather than discussing and even 
fixing them.  Bugs aren't research.

Path evaluation could improved to stream in more cases (that's why LIMIT didn't 
help), but 1195 explains the slowness
and memory.

Andy

On 17/10/17 07:58, Lorenz B. wrote:

Hi,

I just walked through the papers for the upcoming ISWC conference and
found a paper about benchmarking of SPARQL property paths [1] .

Not sure if this is relevant, but it looks like Jena has some issues
with different types of queries using the property path. For example,

SELECT ?o WHERE {A B* ?o.} LIMIT 100

lead to an OOM error on non-cyclic data. Here is the relevant part of
the paper:


While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
exceptions have occurred. During the benchmark process of Jena an
OutOfMemoryError has been thrown whenever a query with the * operator
was used. In order to identify the cause of the error, the amount of
results the query should return has been limited to 100. The results
that have been returned by a query of the form SELECT ?o WHERE {A B*
?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
Due to this fact it is presumable that the query containing the *
operator returns A recursively until the main memory was full. To
ensure that this behaviour is not caused by cycles in the dataset a
query of the same form but with a predicate IRI that did not exist in
the dataset was executed. This query still returned 100 times A. This
indicates, that the * operator is not implemented correctly.

In addition, the experiments showed that:

Due to the problems with the * operator the queries 4, 7 and 8 could
not be processed. Additionally query 3, 5, and 6 returned no results
after 1 hour and thus, were aborted. Query 1 returned an empty and
thus, incomplete result set. Only for query 2 a valid result was
returned. Due to the lack of comparable results, Jena has been omitted
in the comparison of triple stores.


In the discussion section, they summarize the overall performance of Jena by


Jena could not return results for any query in under 1 hour besides
query 2. Furthermore, the * operator could not be evaluated at all and
the inverse operator returned empty result sets.


It looks like they used version 3.0.1, so maybe this doesn't hold
anymore for all of the queries. If not, it could be interesting to
improve performance and/or completeness.

I hope I didn't miss some open JIRA ticket, but in general I just wanted
to highlight the presence of some published benchmark for those kind of
queries.


Cheers,

Lorenz

[1] http://ceur-ws.org/Vol-1932/paper-04.pdf



Re: Release Jena 3.5.0?

2017-10-17 Thread ajs6f

Claude--

I see some updates available for the contract test machinery:

org.xenei:contract-test-maven-plugin .. 0.1.5 -> 0.1.7
org.xenei:junit-contracts . 0.1.5 -> 0.1.7

Worth doing before a release?


ajs6f

Andy Seaborne wrote on 10/16/17 6:32 PM:

The tick is approaching.
Are we ready to go? JIRA to be marked resolved?

If so, I'll sort out a release soon.

Andy

Here's a list of changes of note that I gathered:

 Release changes

Introducing TDB2:
http://jena.staging.apache.org/documentation/tdb2/

*TDB2 is not compatible with TDB1*

Compared to TDB1:
* No size limits on transactions : bulk uploads into a live Fuseki
   can e 100's of millions of triples.
* Models and Graphs can be passed across transactions
* No queue of delayed updates, no transaction backlog problems.
* "Writer pays" - readers don't
   All work for update is done on the writer thread.
* Datatypes of numerics preserved; xsd:doubles supported.

TDB2 is subject to change.

We solicit any and all feedback (good and bad!) about TDB2 to help
advance it to deployment-ready.

JENA-1390 : Add StmtIterator.toModel :

JENA-1392 : Add dynamic dataset support to SDB.

JENA-1395 : "--output RDF/XML" now prints using the basic block-oriented
writer, which uses less memory.  Use "--formatted" (same as "--pretty")
for pretty printed RDF/XML.

JENA-1398 :
Upgrade FOAF to add new spelling and deprecation of old for archaic FOAF
properties

== Dependency changes:

No license changes.

Upgrade jsonld-java to 0.11
   jackson to 2.9.0
   commons-fileuploader to 1.3.2->1.3.3
   commons-io 2.5 in jena-base
 (was pulled in anyway by jsonld-java)



github stuff Was: [2/2] jena git commit: JENA-1391: adding isEmpty method to Dataset

2017-10-16 Thread ajs6f


Andy Seaborne wrote on 10/13/17 3:40 PM:

If anyone is interesting in following it up, I have read that Apache projects 
can now use gitbox where by all work is on
Github, including the full PR cycle, and the ASF is mirrored back.  To us, it 
looks like the GH is the master and ASF
the mirror (IIRC its a bit more complicated under the hood for INFRA than that).

Andy


That sounds good to me. Is this the sort of thing for which I could just file a 
ticket on INFRA and follow up with them?

As long as we are digressing, you know what I would really love? Being able to do our docs/site in git/github. I'm 
pretty sure other Apache projects manage to do that...


ajs6f


Re: [2/2] jena git commit: JENA-1391: adding isEmpty method to Dataset

2017-10-13 Thread ajs6f
I did exactly that -- rebase branch over master, merge branch into master, and push to apache:master (which is what I 
usually do). I see them being different than the commits in the PR, but I can't see for the life of me why...


Anyway, I force-pushed to the PR-- that seems to have closed it.

ajs6f

Andy Seaborne wrote on 10/13/17 11:54 AM:

Adam,

I guess you pushed from your local repo to Jena Aapche git repo? Maybe after a 
rebase?

These aren't the commits on the PR.

Could you pull from GH? Or otherwise tidy up the PR?

(you can force push changes from your local repo to GH)

Thanks
Andy

On 13/10/17 15:40, aj...@apache.org wrote:

JENA-1391: adding isEmpty method to Dataset


Project: http://git-wip-us.apache.org/repos/asf/jena/repo
Commit: http://git-wip-us.apache.org/repos/asf/jena/commit/b792e8da
Tree: http://git-wip-us.apache.org/repos/asf/jena/tree/b792e8da
Diff: http://git-wip-us.apache.org/repos/asf/jena/diff/b792e8da

Branch: refs/heads/master
Commit: b792e8da1fbe7e397399f2b0803f4e28222c9c3e
Parents: 32de4dc
Author: ajs6f 
Authored: Thu Oct 12 10:18:41 2017 -0400
Committer: ajs6f 
Committed: Fri Oct 13 10:40:18 2017 -0400

--
  jena-arq/src/main/java/org/apache/jena/query/Dataset.java| 7 +++
  .../main/java/org/apache/jena/sparql/core/DatasetImpl.java   | 5 +
  .../org/apache/jena/sparql/core/AbstractTestDataset.java | 8 
  3 files changed, 20 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/jena/blob/b792e8da/jena-arq/src/main/java/org/apache/jena/query/Dataset.java
--
diff --git a/jena-arq/src/main/java/org/apache/jena/query/Dataset.java
b/jena-arq/src/main/java/org/apache/jena/query/Dataset.java
index db88642..539053a 100644
--- a/jena-arq/src/main/java/org/apache/jena/query/Dataset.java
+++ b/jena-arq/src/main/java/org/apache/jena/query/Dataset.java
@@ -113,4 +113,11 @@ public interface Dataset extends Transactional
   *  The dataset can not be used for query after this call.
   */
  public void close() ;
+
+/**
+ * @return Whether this {@code Dataset} is empty of graphs. Be aware of 
the semantic looseness inherent in
+ * https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#h_note_4";>the 
definition
+ * of RDF Datasets; whether a named graph exists if nothing is in it 
is implementation-specific.
+ */
+boolean isEmpty();
  }

http://git-wip-us.apache.org/repos/asf/jena/blob/b792e8da/jena-arq/src/main/java/org/apache/jena/sparql/core/DatasetImpl.java

--
diff --git a/jena-arq/src/main/java/org/apache/jena/sparql/core/DatasetImpl.java
b/jena-arq/src/main/java/org/apache/jena/sparql/core/DatasetImpl.java
index 2216d2f..00e419a 100644
--- a/jena-arq/src/main/java/org/apache/jena/sparql/core/DatasetImpl.java
+++ b/jena-arq/src/main/java/org/apache/jena/sparql/core/DatasetImpl.java
@@ -209,4 +209,9 @@ public class DatasetImpl implements Dataset
  if ( uri == null )
  throw new ARQException("null for graph name");
  }
+
+@Override
+public boolean isEmpty() {
+return dsg.isEmpty();
+}
  }

http://git-wip-us.apache.org/repos/asf/jena/blob/b792e8da/jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java

--
diff --git 
a/jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java
b/jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java
index 0ac1dee..b55991d 100644
--- 
a/jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java
+++ 
b/jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java
@@ -108,4 +108,12 @@ public abstract class AbstractTestDataset extends BaseTest
  assertFalse(model1.isIsomorphicWith(ds.getNamedModel(graphName))) ;
  assertTrue(model2.isIsomorphicWith(ds.getNamedModel(graphName))) ;
  }
+
+@Test public void dataset_06()
+{
+String graphName = "http://example/"; ;
+Dataset ds = createDataset() ;
+ds.addNamedModel(graphName, model1) ;
+assertFalse("Dataset should not be empty after a named graph has been 
added!", ds.isEmpty());
+}
  }



[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset

2017-10-13 Thread ajs6f
Github user ajs6f closed the pull request at:

https://github.com/apache/jena/pull/287


---


[GitHub] jena pull request #288: JENA-1401 (fuseki backup) Don't use Jetty code in wa...

2017-10-13 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/288#discussion_r144571831
  
--- Diff: 
jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/async/AsyncPool.java
 ---
@@ -51,7 +51,13 @@ public AsyncTask submit(Runnable task, String 
displayName, DataService dataServi
 synchronized(mutex) {
 String taskId = Long.toString(++counter) ;
 Fuseki.serverLog.info(format("Task : %s : %s",taskId, 
displayName)) ;
-Callable c = Executors.callable(task) ;
+Callable c = ()->{
+try { task.run(); } 
+catch (Throwable th) {
+Fuseki.serverLog.warn(format("Exception in task %s 
execution", taskId), th);
--- End diff --

Does this qualify as an error? (Logging-wise)


---


[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset

2017-10-12 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/287#discussion_r144359574
  
--- Diff: jena-arq/src/main/java/org/apache/jena/query/Dataset.java ---
@@ -113,4 +113,9 @@
  *  The dataset can not be used for query after this call.
  */
 public void close() ;
+
+/**
+ * @return Whether this {@code Dataset} is empty of triples, whether 
in the default graph or in any named graph.
--- End diff --

@afs  Better?


---


Re: Fuseki service extensibility

2017-10-12 Thread ajs6f

I'm not in a big hurry to work on this, but LDP access to a dataset might find 
that useful.


ajs6f

Andy Seaborne wrote on 10/12/17 11:54 AM:

JENA-1400 is a small step to providing some degree of flexibility in Fuseki for 
adding custom services to a dataset.
The JIRA is needed because currently the OperationName set is sealed.

I'm not seeing this as a common thing to do. Many things are better done (e.g. 
data conversion) output and streamed to
Fuseki.

The one I have mind is implementing a patch service (and using HTTP PATCH, as 
well as POST) based on RDF Patch [1].
Changes to datasets can be calculated elsewhere and the Fuseki dataset changed. 
 (It's quite hard to automatically
generate SPARQL Update for arbitrary changes if there are blank nodes involved.)

Any other use cases?

Andy

[1] https://afs.github.io/rdf-delta/


[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset

2017-10-12 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/287#discussion_r144321650
  
--- Diff: jena-arq/src/main/java/org/apache/jena/query/Dataset.java ---
@@ -113,4 +113,11 @@
  *  The dataset can not be used for query after this call.
  */
 public void close() ;
+
+/**
+ * @return Whether this {@code Dataset} is empty of graphs. Be aware 
of the semantic looseness inherent in
+ * https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#h_note_4";>the 
definition
+ * of RDF Datasets; whether a named graph exists if nothing is in 
it is implementation-specific.
+ */
+boolean isEmpty();
--- End diff --

I didn't do that at first because it felt like a bit of a conflict against 
the rest of the API for `Dataset`, which discusses models/graphs and not 
tuples. But if you're okay with it, it doesn't bother me.


---


[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset

2017-10-12 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/287#discussion_r144314830
  
--- Diff: 
jena-arq/src/test/java/org/apache/jena/sparql/core/AbstractTestDataset.java ---
@@ -108,4 +108,12 @@
 assertFalse(model1.isIsomorphicWith(ds.getNamedModel(graphName))) ;
 assertTrue(model2.isIsomorphicWith(ds.getNamedModel(graphName))) ;
 }
+
+@Test public void dataset_06()
+{
+String graphName = "http://example/"; ;
+Dataset ds = createDataset() ;
+ds.addNamedModel(graphName, model1) ;
+assertFalse("Dataset should not be empty after a named graph has 
been added!", ds.isEmpty());
+}
--- End diff --

See above-- so do we need a different impl of `isEmpty` for every kind of 
`DatasetGraph`?


---


[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset

2017-10-12 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/287#discussion_r144314175
  
--- Diff: jena-arq/src/main/java/org/apache/jena/query/Dataset.java ---
@@ -113,4 +113,11 @@
  *  The dataset can not be used for query after this call.
  */
 public void close() ;
+
+/**
+ * @return Whether this {@code Dataset} is empty of graphs. Be aware 
of the semantic looseness inherent in
+ * https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#h_note_4";>the 
definition
+ * of RDF Datasets; whether a named graph exists if nothing is in 
it is implementation-specific.
+ */
+boolean isEmpty();
--- End diff --

Oh, fudge. Then we really can't have a default impl of this, can we?


---


[GitHub] jena pull request #287: JENA-1391: adding isEmpty method to Dataset

2017-10-12 Thread ajs6f
GitHub user ajs6f opened a pull request:

https://github.com/apache/jena/pull/287

JENA-1391: adding isEmpty method to Dataset

One of the asks from JENA-1391. Not by any means the whole ticket. :)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajs6f/jena JENA-1391isEmpty

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #287


commit 58653f788f76d8ebb76b4d97a1f75d2f1027824a
Author: ajs6f 
Date:   2017-10-12T14:18:41Z

JENA-1391: adding isEmpty method to Dataset




---


Re: Obfuscation Support?

2017-10-12 Thread ajs6f
I think that having the tooling available would be nothing but good. (Well, except for the hard work that Rob will have 
to do to make it happen. :g:) And I agree with Andy that we want to be careful about how we present it-- managing 
expectations is key. Perhaps we can make a point of providing the tooling in a way that moves users through some 
thinking about MCVE provision and so forth? I'm just imagining a page on the site where you get the tool, with that link 
wrapped in some useful guidance explaining the limitations that Andy discussed, how to be sure you are asking your 
question in a way that will get the best answers, etc.



Do we perhaps need to consider how we could make clear that there is an ability 
to purchase support from external vendors? Would it be possible to have a page 
on the website that provides a list of known support vendors, obviously with 
the appropriate disclaimers around nonendorsement, neutrality etc and the 
ability for anyone who asks to have their Company listed?


+1! I bet we can do this, well within Apache boundaries. For example, there are 
plenty of pages like:

https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support


ajs6f

Rob Vesse wrote on 10/12/17 9:21 AM:

My intention was not for us to start offering a debugging service nor to stop 
expecting users to provide a minimal complete example.

My thinking is that it provides a way to help users in providing a complete 
example, I was not expecting that they would use it to submit their entire data 
sets. And clearly obfuscation does have limits, particularly when you consider 
things like typed literals where are you almost need to leave them alone in 
order for the obfuscated outputs to have any semblance of meaning and 
usefulness.

I totally agree that none of us has the time to dive into detailed debugging of 
users problems. Do we perhaps need to consider how we could make clear that 
there is an ability to purchase support from external vendors? Would it be 
possible to have a page on the website that provides a list of known support 
vendors, obviously with the appropriate disclaimers around nonendorsement, 
neutrality etc and the ability for anyone who asks to have their Company listed?

Rob

On 12/10/2017 12:36, "Andy Seaborne"  wrote:

Good question.

It might be valuable to add to the collection of tools.

I do have some concern about we are offering here though.

(1) if we offer to look at large datasets and/or large log files, then
work is moving from the user to the list.

(2) the obfuscated data is public. We don't want any
commitment/liability here that the code is, say, suitable for personal
data because sometimes obfuscation is not enough.


On the first point:

Part of a CMVE [1] is the user doing some work.  If we make it
acceptable to bypass that, the work still exists but it has been
transferred.

I simply can't spend 1+ hour setting up a test environment.  Performance
can involve load as well and I don't have the infrastructure to look at
that.

I'm more willing to spend time if the user is in a university/non-profit
or for people, commercial or otherwise, who engage in useful discussion.
A good report is a contribution.

But I'm not willing (or even able) to subsidise commercial organisations
per se. They can go find and pay for commercial support contract or
contract with someone (a contributor/committer maybe) and have a
confidentiality agreement.

It is not always one question in isolation.  Solve one issue and then
another arrives.

Sorry if this is grumpy but I can see ways things might turn out not so
well without us also having common agreement about how we operate on users@.

Andy

[1] and point to
https://stackoverflow.com/help/mcve

PS
There is also a theme of "ask first" before trying anything, or doing in
a few minutes investigation. Such emails are vague.



On 12/10/17 10:03, Rob Vesse wrote:
> Folks
>
>
>
>   An occasional recurring theme I see on the users list is we get a vague 
question about performance details where users can’t/won’t share Data and queries 
because of confidentiality or other concerns. This is something we’ve encountered 
in the past with customers for our commercial products and so internally we 
developed some obfuscation code using Jena APIs so that we can obfuscate queries 
and dates in our logs allowing customers to share these without confidentiality 
being breached.
>
>
>
>   Would it be valuable to the project if we cleaned this up and made it a 
part of core Jena libraries?
>
>
>
>   It would probably take a bit of time to unpick this from our code and 
to generalise it but I think it could be a very useful feature going forward. Let 
me know what you think
>
>
>
> Rob
>
>







Re: TDB2 merged

2017-10-07 Thread ajs6f
Okay, that makes sense. We might even just swap the "namespaces" at some future point when TDB2 becomes the default, 
i.e. go to tdbquery being for TDB2 and there being a tdb1.tdbquery, as a stop on the road to deprecation.


ajs6f
Andy Seaborne wrote on 10/7/17 9:42 AM:



On 06/10/17 21:17, aj...@apache.org wrote:

The commands are in the binary distribution "apache-jena" download but there 
are no script wrappers (easy to copy and
fix though).


Just a thought-- maybe better to add flags to the current scripts? Having 
all-new loader scripts for TDB2 would make
for three different bulk loader scripts...


Maybe though it's not so simple a thing to do as the scripts are a general 
wrapper template to call the java code.

For now, the TDB2 commands are of the form "tdb2.tdb*"

tdb2.tdbquery ...

Sometime, detecting the database type would be great but not critical path for 
the 3.5.0.

Andy




ajs6f

Andy Seaborne wrote on 10/6/17 7:36 AM:

That would be very helpful.

"documentation" is a task in the next few days. It's the block on sending any 
messages to users@ etc about it.


The raw material is in git:

https://github.com/apache/jena/blob/master/jena-db/use-fuseki-tdb2.md
https://github.com/apache/jena/blob/master/jena-db/use-tdb2-cmds.md

The commands are in the binary distribution "apache-jena" download but there 
are no script wrappers (easy to copy and
fix though).

Either run from development or

java -cp 'DIR/lib/*' tdb2.tdbloader ... args ...


some of my data files are too big to
be loaded via the Graph Store API.


From TDB2 and Fuseki's point of view, that's no longer true.
You can (should be able to) load any amount.

The fuseki-basic server also has TDB2 in it so if you are doing everything 
script-driven, you can run that "--conf
config-tdb2.ttl"

There is no progress indicator in the server log so you may wish to set set 
some kind of verbose option in the sender.

Andy

Uploading large files:

The UI does this all quite well.

What's the magic for a command line/scripted process?

It needs a tool that does not buffer or inspect the file or otherwise try to be 
helpful.

Anyone know of good tools for this?

I haven't managed to work out which set of "curl" arguments do this without 
buffering the file (--data* seem to
buffer the file; -F is a form upload, not pure
POST).

This seems to work:

wget --post-file=/home/afs/Datasets/BSBM/bsbm-200m.nt --header 'Content-type: 
application/n-triples'
http://localhost:3030/data

200M BSBM (49Gbytes) loaded at 42K triples/s.

The content length in the fuskei log is reported wrongly (1002691465 ... 
int/long error) but the triple count is right.

It does ruins the interactive performance of the machine!

s-post crashes immediately if given a large files - don't know why.

On 06/10/17 07:50, Osma Suominen wrote:

Excellent!

I have a couple of Fuseki installations where I could test drive this. I'd just 
need to know how to do the
configuration, and also a tool like tdbloader for
offline loading since some of my data files are too big to be loaded via the 
Graph Store API.

No hurry though.

-Osma


Andy Seaborne kirjoitti 04.10.2017 klo 00:43:

It's in the build joined in at apache-jena-libs.

It is in Fuseki2 server jar, but not the UI - a user needs to use a 
configuration file. That also works in
fuseki-basic.

Documentation to follow.

Andy





Re: TDB2 merged

2017-10-06 Thread ajs6f

The commands are in the binary distribution "apache-jena" download but there 
are no script wrappers (easy to copy and fix though).


Just a thought-- maybe better to add flags to the current scripts? Having 
all-new loader scripts for TDB2 would make for three different bulk loader 
scripts...


ajs6f

Andy Seaborne wrote on 10/6/17 7:36 AM:

That would be very helpful.

"documentation" is a task in the next few days. It's the block on sending any 
messages to users@ etc about it.


The raw material is in git:

https://github.com/apache/jena/blob/master/jena-db/use-fuseki-tdb2.md
https://github.com/apache/jena/blob/master/jena-db/use-tdb2-cmds.md

The commands are in the binary distribution "apache-jena" download but there 
are no script wrappers (easy to copy and fix though).

Either run from development or

java -cp 'DIR/lib/*' tdb2.tdbloader ... args ...


some of my data files are too big to
be loaded via the Graph Store API.


From TDB2 and Fuseki's point of view, that's no longer true.
You can (should be able to) load any amount.

The fuseki-basic server also has TDB2 in it so if you are doing everything script-driven, 
you can run that "--conf config-tdb2.ttl"

There is no progress indicator in the server log so you may wish to set set 
some kind of verbose option in the sender.

Andy

Uploading large files:

The UI does this all quite well.

What's the magic for a command line/scripted process?

It needs a tool that does not buffer or inspect the file or otherwise try to be 
helpful.

Anyone know of good tools for this?

I haven't managed to work out which set of "curl" arguments do this without 
buffering the file (--data* seem to buffer the file; -F is a form upload, not pure
POST).

This seems to work:

wget --post-file=/home/afs/Datasets/BSBM/bsbm-200m.nt --header 'Content-type: 
application/n-triples' http://localhost:3030/data

200M BSBM (49Gbytes) loaded at 42K triples/s.

The content length in the fuskei log is reported wrongly (1002691465 ... 
int/long error) but the triple count is right.

It does ruins the interactive performance of the machine!

s-post crashes immediately if given a large files - don't know why.

On 06/10/17 07:50, Osma Suominen wrote:

Excellent!

I have a couple of Fuseki installations where I could test drive this. I'd just 
need to know how to do the configuration, and also a tool like tdbloader for
offline loading since some of my data files are too big to be loaded via the 
Graph Store API.

No hurry though.

-Osma


Andy Seaborne kirjoitti 04.10.2017 klo 00:43:

It's in the build joined in at apache-jena-libs.

It is in Fuseki2 server jar, but not the UI - a user needs to use a 
configuration file. That also works in fuseki-basic.

Documentation to follow.

Andy





Re: [DRAFT] Jena report - October 2017

2017-10-05 Thread ajs6f

+1


ajs6f

Andy Seaborne wrote on 10/5/17 8:48 AM:

## Description:

Jena is a framework for developing Semantic Web and Linked Data
applications in Java. It provides implementation of W3C standards for
RDF and SPARQL.

## Issues:

There are no issues requiring board attention at this time.

## Activity:

Jena released version 3.4.0 2017-07-17

The project has received a software contribution of a new storage subsystem.  
Software grants from the main developer (who is also a committer but this work 
was
not done at Apache) and his employer, for most of the development period, have been 
obtained. This work was originally funded by a UK government R&D grant, with
the condition the work was open source.

## Health report:

The activity levels look normal.

## PMC changes:

 - Currently 12 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Adam Soroka on Mon Jun 06 2016

## Committer base changes:

 - Currently 15 committers.
 - No new committers added in the last 3 months
 - Last committer addition was Lorenz Buehmann at Fri Oct 28 2016

## Releases:

 - Last release was 3.4.0 on 2017-07-17

## JIRA activity:

 - 29 JIRA tickets created in the last 3 months
 - 30 JIRA tickets closed/resolved in the last 3 months



[GitHub] jena pull request #282: JENA-1393: Format prefix names

2017-10-03 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/282#discussion_r142396325
  
--- Diff: jena-arq/src/main/java/org/apache/jena/sparql/util/FmtUtils.java 
---
@@ -535,6 +535,7 @@ private static boolean validPNameChar(char ch)
 {
 if ( Character.isLetterOrDigit(ch) ) return true ;
 if ( ch == '.' )return true ;
+if ( ch == ':' )return true ;
--- End diff --

Oh, missed that. Yeah, I was reading some Scala the other day and it made 
me sad to come back to ADT-less Java. 😞 


---


[GitHub] jena pull request #282: JENA-1393: Format prefix names

2017-10-02 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/282#discussion_r142186147
  
--- Diff: jena-arq/src/main/java/org/apache/jena/sparql/util/FmtUtils.java 
---
@@ -535,6 +535,7 @@ private static boolean validPNameChar(char ch)
 {
 if ( Character.isLetterOrDigit(ch) ) return true ;
 if ( ch == '.' )return true ;
+if ( ch == ':' )return true ;
--- End diff --

This is getting long enough that it might read better as a `switch`/`case`.


---


[GitHub] jena pull request #281: Slice out old Codehaus JXR Maven plugin invocation

2017-10-01 Thread ajs6f
GitHub user ajs6f opened a pull request:

https://github.com/apache/jena/pull/281

Slice out old Codehaus JXR Maven plugin invocation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajs6f/jena FixJXRPlugin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/281.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #281


commit 4f925daf6b5dc31e6bd2faefdac9885fb9d3940f
Author: ajs6f 
Date:   2017-10-01T18:04:41Z

Slice out old Codehaus JXR Maven plugin invocation




---


Re: Codehaus JXR missing?

2017-10-01 Thread ajs6f

Sure, as long as it doesn't seem that there is any actual reason for it. (and 
it doesn't)


ajs6f

Andy Seaborne wrote on 10/1/17 1:43 PM:

especially as the Apache one is setup in jena-project/pom.xml.

Do you want to go and fix this?

Andy

On 01/10/17 15:35, aj...@apache.org wrote:

I just made a minor PR (content doesn't really matter) and the Travis CI build 
is repeatedly showing:

ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-dependency-plugin:2.10:resolve-plugins 
(resolve-plugins) on project jena-maven-tools: Nested::
Could not find artifact org.codehaus.mojo:jxr-maven-plugin:jar:1.5 in central 
(https://repo.maven.apache.org/maven2)
[ERROR]

which seems like of weird, both in that a plugin would be missing and that we 
seem to be using a Codehaus version of JXR--
anyone know if there is a particular reason we don't use the Apache version:

http://maven.apache.org/jxr/maven-jxr-plugin/

?



Codehaus JXR missing?

2017-10-01 Thread ajs6f

I just made a minor PR (content doesn't really matter) and the Travis CI build 
is repeatedly showing:

ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.10:resolve-plugins (resolve-plugins) on project jena-maven-tools: Nested:: 
Could not find artifact org.codehaus.mojo:jxr-maven-plugin:jar:1.5 in central (https://repo.maven.apache.org/maven2)

[ERROR]

which seems like of weird, both in that a plugin would be missing and that we 
seem to be using a Codehaus version of JXR--
anyone know if there is a particular reason we don't use the Apache version:

http://maven.apache.org/jxr/maven-jxr-plugin/

?

--

ajs6f


[GitHub] jena pull request #280: Deprecate Jena's Callback in favor of Java API's ...

2017-10-01 Thread ajs6f
GitHub user ajs6f opened a pull request:

https://github.com/apache/jena/pull/280

Deprecate Jena's Callback in favor of Java API's Consumer

Seems like we could deprecate `Callback` in the next release and remove 
in the following, unless I am missing something about the contract for 
callbacks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajs6f/jena CallbackToConsumer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/280.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #280


commit 26592ee47d2203cfa8165d350c4f07908e760ee0
Author: ajs6f 
Date:   2017-10-01T14:00:06Z

Deprecate Jena's Callback in favor of Java API's Consumer




---


Re: [VOTE] Accept contribution of TDB2

2017-09-22 Thread ajs6f

+1


ajs6f

Osma Suominen wrote on 9/22/17 10:25 AM:

Andy Seaborne kirjoitti 22.09.2017 klo 16:55:

This VOTE is to accept a contribution of software for TDB2 comprising of the 
contents of the GitHub repository:

   https://github.com/afs/mantis

as of commit 71a70fd76ebc35cda26258bad0459e97f9860b04 (2017-09-22)
subject to software grants from Epimorphics Ltd and Andy Seaborne, which cover 
the entire contribution.

Please vote to approve receiving this contribution:

   [ ] +1 Accept the contribution
   [ ] -1 Don't accept the contribution because ...


+1

-Osma




Re: eclipse and shaded guava?

2017-09-11 Thread ajs6f

This is a long-standing annoyance caused by our need to shade a modern version 
of Guava into the code to avoid conflicting with the very old version in Hadoop.

Do you have the jena0-shaded-guava project open in Eclipse? The problem usually 
goes away if it is closed.


ajs6f
Chris Tomlinson wrote on 9/11/17 1:47 PM:

Hi,

I’m having a bit of a hassle getting eclipse Mars 4.5.2 to hook up properly 
with imports like:


import org.apache.jena.ext.com.google.common.cache.CacheBuilder ;
import org.apache.jena.ext.com.google.common.cache.CacheStats ;


I "git clone" jena and

mvn clean install
mvn eclipse:eclipse

and then import the various submodules as existing maven projects into eclipse. 
Once the imports complete there are a few of the submodules with syntax errors 
in eclipse centered on the shaded guava. The projects with errors all have

jena-shaded-guava

as a project dependency in the .project and also a library reference to 
M2_REPO/com/google/guava/guava/21.0/guava-21.0.jar in the .classpath.

The jena repo and submodules build and test fine from the command line.

I’ve run maven update project on all of the jena projects and once the “update 
project” process completes the errors are cleared (a result of “clean projects” 
being checked) from all of the projects and then during the “building 
workspace” process the errors reappear one-by-one as the workspace is rebuilt.

I appreciate any ideas about what I’m stumbling on.

Thanks,
Chris




Re: Jena over Cassandra?

2017-09-05 Thread ajs6f

No, I had not seen that, thanks! Looks very interesting!


ajs6f

Phil Coates wrote on 9/5/17 11:04 AM:

Have you looked at CM-Well (https://github.com/thomsonreuters/CM-Well)?

This is based on Cassandra and ElasticSearch.


*Philip Coates*

philip.coa...@semanticintegration.co.uk 
<mailto:philip.coa...@semanticintegration.co.uk>
philip.coa...@sparqlr.com <mailto:philip.coa...@sparqlr.com>
skype:philip.coates.76
Tel: +44 (0)7711 818384

*SemanticIntegration* <http://www.semanticintegration.co.uk/>

On 5 September 2017 at 15:40, mailto:aj...@apache.org>> 
wrote:

The requirements for distributed storage are actually that DRAS-TIC (see 
that grant description) be used, and DRAS-TIC is 100% based around Cassandra, so
effectively, the requirement is that Cassandra be used, at least at core. So 
part of what I am wondering (if it's not obvious) is "If we're going to have a
Cassandra cluster as part of this, how can we get as much mileage as possible 
out of it?"

I know that Cassandra offers some ordering capabilities out-of-the-box, 
although I'm not familiar with them. Maybe they could be used to support merge 
join
generally.

CumulusRDF (as shown in that paper I forwarded) uses a structure in which 
they mostly leave column values empty. The information is stored entirely in the
keys, and use is made of prefix lookup. Does your system do something like 
that, Claude? It sounds like you are storing tuple component in the column 
values.


ajs6f

Andy Seaborne wrote on 9/5/17 4:43 AM:


On Mon, Sep 4, 2017 at 12:10 PM, mailto:aj...@apache.org>> wrote:

Little of both? :grin:

Primarily I am interested because of a grant [1] in which 
the Smithsonian
Institution (where I work) is participating in a supporting 
role (partly
because I convinced us to). That work involves using 
Cassandra for
distributed storage, and it will also involve a distributed 
LDP
implementation (the Fedora API referred to in that grant 
description is
really just a packaging of Memento [2] with LDP [3]), hence 
my interest in
jena-on-cassandra.


Turning this round - what are the requirements for the distributed 
storage?

As I understand the join question, the usual move with 
Cassandra is to
denormalize and store the joined data together, but that's 
obviously
nontrivial in our situation, where we don't know the 
potential queries.
Have you looked at an indexing solution such as was used by 
CumulusRDF [4]?


(single graph example)

If Cassandra has stored PSO and POS then parallel merge joins are 
possible.

Andy


ajs6f

[1] https://www.imls.gov/grants/awarded/lg-71-17-0159-17 
<https://www.imls.gov/grants/awarded/lg-71-17-0159-17>
[2] http://www.mementoweb.org/guide/quick-intro/ 
<http://www.mementoweb.org/guide/quick-intro/>
[3] https://www.w3.org/TR/ldp/
[4] http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Worksh 
<http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Worksh>
ops/SSWS/Ladwig-et-all-SSWS2011.pdf

Claude Warren wrote on 9/2/17 12:44 PM:

are you looking to use jena-on-cassandra or do you have 
ideas?  what leads

you to ask about it?


On Sat, Sep 2, 2017 at 1:21 PM, mailto:aj...@apache.org>> wrote:

Hey, Claude--


Just curious as to where 
https://github.com/Claudenw/jena-on-cassandra 
<https://github.com/Claudenw/jena-on-cassandra>
has ended up. Is that still work-in-progress?

--

ajs6f







--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren 
<http://www.linkedin.com/in/claudewarren>







Re: Jena over Cassandra?

2017-09-05 Thread ajs6f
The requirements for distributed storage are actually that DRAS-TIC (see that grant description) be used, and DRAS-TIC is 100% based around Cassandra, so 
effectively, the requirement is that Cassandra be used, at least at core. So part of what I am wondering (if it's not obvious) is "If we're going to have a 
Cassandra cluster as part of this, how can we get as much mileage as possible out of it?"


I know that Cassandra offers some ordering capabilities out-of-the-box, although I'm not familiar with them. Maybe they could be used to support merge join 
generally.


CumulusRDF (as shown in that paper I forwarded) uses a structure in which they mostly leave column values empty. The information is stored entirely in the keys, 
and use is made of prefix lookup. Does your system do something like that, Claude? It sounds like you are storing tuple component in the column values.



ajs6f

Andy Seaborne wrote on 9/5/17 4:43 AM:



On Mon, Sep 4, 2017 at 12:10 PM,  wrote:


Little of both? :grin:

Primarily I am interested because of a grant [1] in which the Smithsonian
Institution (where I work) is participating in a supporting role (partly
because I convinced us to). That work involves using Cassandra for
distributed storage, and it will also involve a distributed LDP
implementation (the Fedora API referred to in that grant description is
really just a packaging of Memento [2] with LDP [3]), hence my interest in
jena-on-cassandra.


Turning this round - what are the requirements for the distributed storage?


As I understand the join question, the usual move with Cassandra is to
denormalize and store the joined data together, but that's obviously
nontrivial in our situation, where we don't know the potential queries.
Have you looked at an indexing solution such as was used by CumulusRDF [4]?


(single graph example)

If Cassandra has stored PSO and POS then parallel merge joins are possible.

Andy



ajs6f

[1] https://www.imls.gov/grants/awarded/lg-71-17-0159-17
[2] http://www.mementoweb.org/guide/quick-intro/
[3] https://www.w3.org/TR/ldp/
[4] http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Worksh
ops/SSWS/Ladwig-et-all-SSWS2011.pdf

Claude Warren wrote on 9/2/17 12:44 PM:

are you looking to use jena-on-cassandra or do you have ideas?  what leads

you to ask about it?


On Sat, Sep 2, 2017 at 1:21 PM,  wrote:

Hey, Claude--


Just curious as to where https://github.com/Claudenw/jena-on-cassandra
has ended up. Is that still work-in-progress?

--

ajs6f









--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren







Re: Jena over Cassandra?

2017-09-04 Thread ajs6f

Little of both? :grin:

Primarily I am interested because of a grant [1] in which the Smithsonian Institution (where I work) is participating in a supporting role (partly because I 
convinced us to). That work involves using Cassandra for distributed storage, and it will also involve a distributed LDP implementation (the Fedora API referred 
to in that grant description is really just a packaging of Memento [2] with LDP [3]), hence my interest in jena-on-cassandra.


As I understand the join question, the usual move with Cassandra is to denormalize and store the joined data together, but that's obviously nontrivial in our 
situation, where we don't know the potential queries. Have you looked at an indexing solution such as was used by CumulusRDF [4]?


ajs6f

[1] https://www.imls.gov/grants/awarded/lg-71-17-0159-17
[2] http://www.mementoweb.org/guide/quick-intro/
[3] https://www.w3.org/TR/ldp/
[4] 
http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf

Claude Warren wrote on 9/2/17 12:44 PM:

are you looking to use jena-on-cassandra or do you have ideas?  what leads
you to ask about it?


On Sat, Sep 2, 2017 at 1:21 PM,  wrote:


Hey, Claude--

Just curious as to where https://github.com/Claudenw/jena-on-cassandra
has ended up. Is that still work-in-progress?

--

ajs6f







Jena over Cassandra?

2017-09-02 Thread ajs6f

Hey, Claude--

Just curious as to where https://github.com/Claudenw/jena-on-cassandra has 
ended up. Is that still work-in-progress?

--

ajs6f


[GitHub] jena issue #233: Added mosaic and thrift packages to org.apache.jena.sparql....

2017-08-21 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/233
  
@afs Okay, now I see what you mean. Yeah, insofar as this gear is trying to 
"federate" `DatasetGraph`s, it doesn't make sense to penetrate that abstraction 
to reach `TriTable` and `HexTable`, which are really just implementation 
constructs for TIM. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena issue #233: Added mosaic and thrift packages to org.apache.jena.sparql....

2017-08-21 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/233
  
@afs No, I haven't fooled with it at all because I didn't want to spend 
that time until @dick-twocows confirmed that it was ready for other eyes.
Re: `StreamRDFTriHexTable` I didn't see that in `afs/jena:master` or in 
`afs/mantis:master`-- where is it?

I'm certainly +1 to @afs's comments about it being better to have some new 
modules than more code in the core, although distributed operation is very 
important in the future, I think, and I could imagine this stuff migrating into 
the core at some point.

@afs is asking for some clarity on how this stuff is laid out-- one way 
might be for @dick-twocows  to add package comments with a solid description in 
each of what that package does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena issue #274: JENA-1381: Use all information in the cache key (text queri...

2017-08-20 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/274
  
I'm not that worried about this case. (Although I would actually have fewer 
special graph names and more types, but that's just my taste; I'm not arguing 
that we should change that now.) It was more your first remark about "Else, 
we'd end up with `Optional` all over the place" and the fact that I don't 
feel like we have a clear way to make any changes at all to the core SPI and 
API. This (PR) isn't really the right place for the larger discussion-- I'll 
take it to dev@.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena issue #233: Added mosaic and thrift packages to org.apache.jena.sparql....

2017-08-20 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/233
  
Hey, @dick-twocows and @afs, just picking up this conversation. Thanks for 
the work so far, @dick-twocows! Do you feel like this is in a state ready for 
in-depth review, or are you still working with it? @afs, does @dick-twocows's 
comment above gives a good sense of the contribution, or were you looking for 
something more in-depth? I think it makes a good outline and there's not much 
point to filling in a lot of detail until we are sure the contribution is close 
to finished.

I think it would be great to get this into the next release and I would be 
happy to a) work with @dick-twocows to help make that happen and b) cut that 
release. (Although as I never tire of complaining, it would also be great for 
another committer to do that :) ).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena issue #274: JENA-1381: Use all information in the cache key (text queri...

2017-08-20 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/274
  
If we ever want to use `Optional` at all (and I would, I think it is clear 
and avoids special names in many cases), we have to start somewhere (or we have 
to make a massive sudden change to the Graph SPI, maybe 4.0?). I don't want to 
make a fuss, I would just like to be able to gradually introduce it. Maybe not 
on this PR, and maybe gradual isn't better than a big change at 4.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request #275: JENA-1383: Improve handling of bad character encodin...

2017-08-20 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/275#discussion_r134117416
  
--- Diff: 
jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/servlets/ActionSPARQL.java
 ---
@@ -205,7 +206,11 @@ public static void parse(HttpAction action, StreamRDF 
dest, InputStream input, L
 .lang(lang)
 .base(base)
 .parse(dest);
-} 
+} catch (RuntimeIOException ex) {
+if ( ex.getCause() instanceof CharacterCodingException )
+throw new RiotException("Character Coding Error: 
"+ex.getMessage());
--- End diff --

maybe `throw new RiotException("Character Coding Error: "+ex.getMessage(), 
ex.getCause());`  to keep more context?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request #273: JENA-1372: fn:apply and fn:collation-key

2017-08-18 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/273#discussion_r133993197
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/sparql/function/library/FN_Apply.java ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.function.library;
+
+import java.util.List ;
+
+import org.apache.jena.atlas.lib.Cache ;
+import org.apache.jena.atlas.lib.CacheFactory ;
+import org.apache.jena.graph.Node ;
+import org.apache.jena.sparql.expr.ExprEvalException ;
+import org.apache.jena.sparql.expr.ExprList ;
+import org.apache.jena.sparql.expr.NodeValue ;
+import org.apache.jena.sparql.function.Function ;
+import org.apache.jena.sparql.function.FunctionBase ;
+import org.apache.jena.sparql.function.FunctionFactory ;
+import org.apache.jena.sparql.function.FunctionRegistry ;
+import org.apache.jena.sparql.sse.builders.ExprBuildException ;
+import org.apache.jena.sparql.util.Context ;
+
+/** XPath and XQuery Functions and Operators 3.1
+ *  
+ * {@code fn:apply(function, args)}
+ */
+public class FN_Apply extends FunctionBase {
+// Assumes one object per use site. 
+private Cache cache1 = 
CacheFactory.createOneSlotCache();
+
+@Override
+public void checkBuild(String uri, ExprList args) {
+if ( args.isEmpty() )
+throw new ExprBuildException("fn:apply: no function to call 
(minimum number of args is one)");
+}
+@Override
+public NodeValue exec(List args) {
+if ( args.isEmpty() )
+throw new ExprBuildException("fn:apply: no function to call 
(minimum number of args is one)");
+NodeValue functionId = args.get(0);
+List argExprs = args.subList(1,args.size()) ; 
+ExprList exprs = new ExprList();
+argExprs.forEach((a)->exprs.add(a));
--- End diff --

ARGH OMG I hate you Java generics. I've gotten used to Scala's flexibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request #273: JENA-1372: fn:apply and fn:collation-key

2017-08-18 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/273#discussion_r133992495
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/sparql/function/library/FN_Apply.java ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.function.library;
+
+import java.util.List ;
+
+import org.apache.jena.atlas.lib.Cache ;
+import org.apache.jena.atlas.lib.CacheFactory ;
+import org.apache.jena.graph.Node ;
+import org.apache.jena.sparql.expr.ExprEvalException ;
+import org.apache.jena.sparql.expr.ExprList ;
+import org.apache.jena.sparql.expr.NodeValue ;
+import org.apache.jena.sparql.function.Function ;
+import org.apache.jena.sparql.function.FunctionBase ;
+import org.apache.jena.sparql.function.FunctionFactory ;
+import org.apache.jena.sparql.function.FunctionRegistry ;
+import org.apache.jena.sparql.sse.builders.ExprBuildException ;
+import org.apache.jena.sparql.util.Context ;
+
+/** XPath and XQuery Functions and Operators 3.1
+ *  
+ * {@code fn:apply(function, args)}
+ */
+public class FN_Apply extends FunctionBase {
+// Assumes one object per use site. 
+private Cache cache1 = 
CacheFactory.createOneSlotCache();
+
+@Override
+public void checkBuild(String uri, ExprList args) {
+if ( args.isEmpty() )
+throw new ExprBuildException("fn:apply: no function to call 
(minimum number of args is one)");
+}
+@Override
+public NodeValue exec(List args) {
+if ( args.isEmpty() )
--- End diff --

Okay, so the checks could be factored out and either called from both 
`checkBuild` and `exec`, right? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Custom querying algorithm in Jena

2017-08-18 Thread ajs6f

This might be a better question for the Jena dev@ list. I'm copying it there.

In any event, can you say a little more about what you mean by "a new querying 
algorithm"? Presumably you have some specific technique you are investigating?


ajs6f

e1425...@student.tuwien.ac.at wrote on 8/18/17 10:37 AM:

Dear Jena development-community,
my name is Markus Buchta and I am student at the University of Technology of 
Vienna.
For my bachelor's thesis I want to implement a new querying algorithm into Jena.
Since the the project is pretty large and pretty hard to understand for a new 
developer, I want to know if you have any tips for me?
I asking myself where should I start and what is even possible to change at the 
query evaluation process?

I want to already thank you for your help and wish you a nice weekend.

Sincerely
Markus Buchta



[GitHub] jena pull request #273: JENA-1372: fn:apply and fn:collation-key

2017-08-18 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/273#discussion_r133971637
  
--- Diff: 
jena-base/src/main/java/org/apache/jena/atlas/lib/cache/Cache0.java ---
@@ -39,7 +39,13 @@ public V getIfPresent(K key) {
 
 @Override
 public V getOrFill(K key, Callable callable) {
-return null ;
+try {
+return callable.call() ;
+}
+catch (Exception e) {
+e.printStackTrace();
--- End diff --

`printStackTrace()`? Isn't it better to use a logger?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request #273: JENA-1372: fn:apply and fn:collation-key

2017-08-18 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/273#discussion_r133970438
  
--- Diff: 
jena-arq/src/main/java/org/apache/jena/sparql/function/library/FN_Apply.java ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.sparql.function.library;
+
+import java.util.List ;
+
+import org.apache.jena.atlas.lib.Cache ;
+import org.apache.jena.atlas.lib.CacheFactory ;
+import org.apache.jena.graph.Node ;
+import org.apache.jena.sparql.expr.ExprEvalException ;
+import org.apache.jena.sparql.expr.ExprList ;
+import org.apache.jena.sparql.expr.NodeValue ;
+import org.apache.jena.sparql.function.Function ;
+import org.apache.jena.sparql.function.FunctionBase ;
+import org.apache.jena.sparql.function.FunctionFactory ;
+import org.apache.jena.sparql.function.FunctionRegistry ;
+import org.apache.jena.sparql.sse.builders.ExprBuildException ;
+import org.apache.jena.sparql.util.Context ;
+
+/** XPath and XQuery Functions and Operators 3.1
+ *  
+ * {@code fn:apply(function, args)}
+ */
+public class FN_Apply extends FunctionBase {
+// Assumes one object per use site. 
+private Cache cache1 = 
CacheFactory.createOneSlotCache();
+
+@Override
+public void checkBuild(String uri, ExprList args) {
+if ( args.isEmpty() )
+throw new ExprBuildException("fn:apply: no function to call 
(minimum number of args is one)");
+}
+@Override
+public NodeValue exec(List args) {
+if ( args.isEmpty() )
+throw new ExprBuildException("fn:apply: no function to call 
(minimum number of args is one)");
+NodeValue functionId = args.get(0);
+List argExprs = args.subList(1,args.size()) ; 
+ExprList exprs = new ExprList();
+argExprs.forEach((a)->exprs.add(a));
--- End diff --

Maybe `argExprs.forEach(exprs::add);`?

 It's not clear to me why not `ExprList exprs = new ExprList(argExprs);`. 
I'm sure there's a reason-- I must be missing something in the execution flow? 
Why copy instead of view?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


<    1   2   3   4   5   6   7   8   9   10   >