Re: [JENA-DEV] SPARQL - Way to concat several property values

2018-02-20 Thread Paul Tyson
Data and query samples would help. (I could not see the image.)

But from your problem description, you might try GROUP BY and the
GROUP_CONCAT aggregate function. This would put (for example) all the
properties of a subject in one result field, separated by the delimiter
of your choice.

See the discussion of aggregates in SPARQL at
https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#aggregates

Regards,
--Paul

On Tue, 2018-02-20 at 13:39 +0100, Brice Sommacal wrote:
> Hi dear jena users, 
> 
> 
> Here is a question regarding how to output a SPARQL result set into a
> Google SpreadSheet with a concatenation mode.
> 
> 
> We have been using a SPARQL RESULT set in order to feed a google
> spreadsheet. 
> The result is below. There is nothing new, we get one line result for
> each match found in our Jena Model.
> 
> 
> Images intégrées 1
> 
> 
> 
> However, we would like to get concatenated values for equivalent
> rows. 
> As you can see from row 1 to 5 in SPARQL QUERY RESULT, we have found 2
> results for Apple. 2 for Microsoft and 1 for John Doe. All these line
> have one common value: "Entité morale réalisant des activités" (eg.
> column in midle side, column 3).
> From EXPECTED RESULT, row 1, these values have been concatenated with
> respect to the unicity of the value  "Entité morale réalisant des
> activités".
> 
> 
> Some times ago, I have been able to restitute this exact use case by
> developping a specific API going through the result set and outputing
> it in a tabular file. However, I would like to keep best practices in
> place, and I wonder if there is a new way to resolve this using
> SPARQL. 
> 
> 
> I was thinking of using SPARQL SUB SELECT, but last time I have been
> using string methods, I haven't get what I wanted. Do you have an
> example showing how to achieve this results? 
> I may be missing others solutions empowered by JENA and ARQ. Any
> others ways to resolve this will be considered. 
> 
> 
> 
> 
> Your insights and helps would be very much appreciated.
> With best regards, 
> 
> 
> 
> 
> Brice
>  




Save the date: ApacheCon North America, September 24-27 in Montréal

2018-02-20 Thread Rich Bowen

Dear Apache Enthusiast,

(You’re receiving this message because you’re subscribed to a user@ or 
dev@ list of one or more Apache Software Foundation projects.)


We’re pleased to announce the upcoming ApacheCon [1] in Montréal, 
September 24-27. This event is all about you — the Apache project community.


We’ll have four tracks of technical content this time, as well as lots 
of opportunities to connect with your project community, hack on the 
code, and learn about other related (and unrelated!) projects across the 
foundation.


The Call For Papers (CFP) [2] and registration are now open. Register 
early to take advantage of the early bird prices and secure your place 
at the event hotel.


Important dates
March 30: CFP closes
April 20: CFP notifications sent
	August 24: Hotel room block closes (please do not wait until the last 
minute)


Follow @ApacheCon on Twitter to be the first to hear announcements about 
keynotes, the schedule, evening events, and everything you can expect to 
see at the event.


See you in Montréal!

Sincerely, Rich Bowen, V.P. Events,
on behalf of the entire ApacheCon team

[1] http://www.apachecon.com/acna18
[2] https://cfp.apachecon.com/conference.html?apachecon-north-america-2018


Re: Reasoners for RDFS + owl:sameAs: performance, stability & best practices

2018-02-20 Thread Alexis Armin Huf
Hi Andreas,

I had a similar scenario, but as Dave said there is no choose-a-enum-value
reasoner for that. What I did was picking the RDFS and owl:sameAs rules
from the rule files in Jena source and instantiate a GenericRuleReasoner
with my custom rule file. Docs for how to do this are here:
https://jena.apache.org/documentation/inference/index.html#rules. Below is
a short walk through of how I set this up.

First, build your rules file. Look for the files in
jena-core/src/main/resources/etc
,
specially rdfs-fb.rules and owl-fb-mini.rules. Your rule file will look
like this:

-> tableAll().

[rdfs7b: (?a rdf:type rdfs:Class) -> (?a rdfs:subClassOf rdfs:Resource)]

[rdfs2:  (?p rdfs:domain ?c) -> [(?x rdf:type ?c) <- (?x ?p ?y)] ]
[rdfs3:  (?p rdfs:range ?c)  -> [(?y rdf:type ?c) <- (?x ?p ?y)] ]
[rdfs5a: (?a rdfs:subPropertyOf ?b), (?b rdfs:subPropertyOf ?c) -> (?a
rdfs:subPropertyOf ?c)]
[rdfs5b: (?a rdf:type rdf:Property) -> (?a rdfs:subPropertyOf ?a)]
# ... and this goes on ...

# There are a lot of details around owl:sameAs, but you probably will need
these:
[sameAs1: (?A owl:sameAs ?B) -> (?B owl:sameAs ?A) ]
[sameAs2: (?A owl:sameAs ?B) (?B owl:sameAs ?C) -> (?A owl:sameAs ?C) ]
[equality1: (?X owl:sameAs ?Y), notEqual(?X,?Y) ->
[(?X ?P ?V) <- (?Y ?P ?V)]
[(?V ?P ?X) <- (?V ?P ?Y)] ]

Save this file a a resource of your application, parse it and create a
GenericRuleReasoner, like this:

ClassLoader loader = SomeClass.class.getClassLoader();
try (BufferedReader reader = new BufferedReader(new
InputStreamReader(loader.getResourceAsStream("rules/rdfs+sameAs.rules" {
List rules = Rule.parseRules(Rule.rulesParserFromReader(reader));
GenericRuleReasoner reasoner = new GenericRuleReasoner(rules);
return
ModelFactory.createModelForGraph(reasoner.bind(modelThatNeedsReasoning.getGraph()))
}

Hope that helps!



Dave Reynolds  schrieb am Di., 20. Feb. 2018 um
06:03 Uhr:

> Hi Andreas,
>
> Jena does not currently have any alternative built reasoner for RDFS +
> owl:sameAs and I'm not aware of any such "equality reasoner" being in
> development. You could try Pellet, which may offer better performance.
>
> In fact equality reasoning is notoriously expensive in the general case,
> the logic is indeed simple but the cost can blow up easily because it
> leads to a combinatorial number of deductions.
>
> Depending on what problem you are trying to solve your best bet may be
> to avoid using owl:sameAs reasoning at run time altogether. For example,
> in some cases it may be possible to do a pass over the data at ingest
> time to identify all aliases and to only assert in the model some
> "cannonical" URI for each alias equivalence set.
>
> Dave
>
> On 20/02/18 07:17, Andreas Kahl wrote:
> > Hello everyone,
> >
> > I am currently developing a little Jena Model that should be able to do
> > RDFS inferencing plus owl:sameAs. From the documentation I learned that
> > the minimal Reasoner for that is OWLmini. During development I
> > experienced some severe performance bottlenecks if a runtime model
> > contains too many owl:sameAs links and generally for nearly all models
> > exceeding 1000 Statements. Most of the tests simply freeze at some point
> > if those performance bottlenecks occur, sometimes selecting a Statement
> > with a SimpleSelector consisting of a subject URI, a predicate URI and a
> > null Object takes 20secs.
> > There should be not problems with blocking of threads as I run my
> > integration tests single threaded - especially if I am experiencing
> > failures.
> >
> > I could confine this by using models without inferencing while
> > collecting and adding data spidered from the web, and especially adding
> > Ontologies last, only where absolutely needed. Also I use a whitelist
> > internally for domains my spider is allowed to fetch data from;
> > therefore I remove all owl:sameAs Statements containing object URIs not
> > in this whitelist. In the end, in my querying methods, I clone that
> > basic model with the collected data and add it to an InfModel:
> >
> > protected static Model getInfModelFrom(Model model) {
> >  final long size = model.size();
> >  LOG.debug("getInfModelFrom: Input size: " +
> > Long.toString(size));
> >  final Model copy = ModelFactory.createDefaultModel();
> >  copy.add(model instanceof InfModel ? ((InfModel)
> > model).getRawModel() : model);
> >  final InfModel infModel =
> > ModelFactory.createInfModel(ReasonerRegistry.getOWLMiniReasoner(),
> > copy);
> >  return infModel;
> >  }
> >
> > The only Ontology I am using is
> > http://d-nb.info/standards/elementset/gnd# .
> >
> > I suppose that the Reasoner I use is much to mighty for the seemingly
> > simple owl:sameAs. Is there any more basic option understanding
> > owl:sameAs besides RDFS? All other OWL Axioms are not 

[JENA-DEV] SPARQL - Way to concat several property values

2018-02-20 Thread Brice Sommacal
Hi dear jena users,

Here is a question regarding how to output a SPARQL result set into a
Google SpreadSheet with a concatenation mode.

We have been using a SPARQL RESULT set in order to feed a google
spreadsheet.
The result is below. There is nothing new, we get one line result for each
match found in our Jena Model.

[image: Images intégrées 1]

However, we would like to get concatenated values for equivalent rows.
As you can see from row 1 to 5 in SPARQL QUERY RESULT, we have found 2
results for Apple. 2 for Microsoft and 1 for John Doe. All these line have
one common value: "Entité morale réalisant des activités" (eg. column in
midle side, column 3).
>From EXPECTED RESULT, row 1, these values have been concatenated with
respect to the unicity of the value  "Entité morale réalisant des
activités".

Some times ago, I have been able to restitute this exact use case by
developping a specific API going through the result set and outputing it in
a tabular file. However, I would like to keep best practices in place, and
I wonder if there is a new way to resolve this using SPARQL.

I was thinking of using SPARQL SUB SELECT, but last time I have been using
string methods, I haven't get what I wanted. Do you have an example showing
how to achieve this results?
I may be missing others solutions empowered by JENA and ARQ. Any others
ways to resolve this will be considered.


Your insights and helps would be very much appreciated.
With best regards,


Brice


RE: Configuring fuseki with TDB2 and OWL reasoning

2018-02-20 Thread Nouwt, B. (Barry)
Hi Eric,

Another thing I noticed is that you said you were manually loading the data via 
the GUI after Apache Jena Fuseki has started. Maybe that data is not stored in 
the correct location (i.e. the baseModel of the InfModel) and there the 
reasoning fails?

You could try to load the data on Apache Jena Fuseki startup automatically 
using a configuration like below (or is that no longer possible when using 
TDB2?):

<#model_inf> a ja:InfModel ;
...
ja:content <#test-inf> ;
...

<#test-inf> ja:externalContent  .

Regards, Barry


-Original Message-
From: Eric Boisvert [mailto:denevers1...@gmail.com] 
Sent: maandag 19 februari 2018 23:25
To: users@jena.apache.org
Subject: Re: Configuring fuseki with TDB2 and OWL reasoning

made the change.

Now I get

Result: failed with message "Not in a transaction"

when I try to load from the interface


log:

(...)

[2018-02-19 17:21:31] Fuseki INFO  [5] Filename: test.ttl,
Content-Type=appl
ication/octet-stream, Charset=null => Turtle : Count=4 Triples=4 Quads=0
[2018-02-19 17:21:31] Fuseki INFO  [5] 500 Not in a transaction (29 ms)

Thank you very much helping me with this

2018-02-19 9:00 GMT-05:00 Andy Seaborne :

>
>
> On 19/02/18 12:46, Eric Boisvert wrote:
>
>> :dataset a ja:RDFDataset ;
>>  tdb2:defaultGraph   <#model_inf> .
>> ## tdb2:location  "c:\\fuseki/run/databases/gsip".
>>
>>
> ja:defaultGraph  not  tdb2:defaultGraph
>
> :dataset is a plain, in-memory dataset to hold the InfModel
>
> Andy
>
This message may contain information that is not intended for you. If you are 
not the addressee or if this message was sent to you by mistake, you are 
requested to inform the sender and delete the message. TNO accepts no liability 
for the content of this e-mail, for the manner in which you use it and for 
damage of any kind resulting from the risks inherent to the electronic 
transmission of messages.


Re: Configuring fuseki with TDB2 and OWL reasoning

2018-02-20 Thread Andy Seaborne

Eric,

Glad we have got the configuration sorted out.

This is now a different problem which isn't a matter of getting the 
right configuration.  I don't know what's going on yet; it's also not a 
quick thing to look at and address.


I've recorded it as

https://issues.apache.org/jira/browse/JENA-1492

Thank you for the report,

Andy

On 19/02/18 22:25, Eric Boisvert wrote:

made the change.

Now I get

Result: failed with message "Not in a transaction"

when I try to load from the interface


log:

(...)

[2018-02-19 17:21:31] Fuseki INFO  [5] Filename: test.ttl,
Content-Type=appl
ication/octet-stream, Charset=null => Turtle : Count=4 Triples=4 Quads=0
[2018-02-19 17:21:31] Fuseki INFO  [5] 500 Not in a transaction (29 ms)

Thank you very much helping me with this

2018-02-19 9:00 GMT-05:00 Andy Seaborne :




On 19/02/18 12:46, Eric Boisvert wrote:


:dataset a ja:RDFDataset ;
  tdb2:defaultGraph   <#model_inf> .
## tdb2:location  "c:\\fuseki/run/databases/gsip".



ja:defaultGraph  not  tdb2:defaultGraph

:dataset is a plain, in-memory dataset to hold the InfModel

 Andy





Re: Reasoners for RDFS + owl:sameAs: performance, stability & best practices

2018-02-20 Thread Dave Reynolds

Hi Andreas,

Jena does not currently have any alternative built reasoner for RDFS + 
owl:sameAs and I'm not aware of any such "equality reasoner" being in 
development. You could try Pellet, which may offer better performance.


In fact equality reasoning is notoriously expensive in the general case, 
the logic is indeed simple but the cost can blow up easily because it 
leads to a combinatorial number of deductions.


Depending on what problem you are trying to solve your best bet may be 
to avoid using owl:sameAs reasoning at run time altogether. For example, 
in some cases it may be possible to do a pass over the data at ingest 
time to identify all aliases and to only assert in the model some 
"cannonical" URI for each alias equivalence set.


Dave

On 20/02/18 07:17, Andreas Kahl wrote:

Hello everyone,

I am currently developing a little Jena Model that should be able to do
RDFS inferencing plus owl:sameAs. From the documentation I learned that
the minimal Reasoner for that is OWLmini. During development I
experienced some severe performance bottlenecks if a runtime model
contains too many owl:sameAs links and generally for nearly all models
exceeding 1000 Statements. Most of the tests simply freeze at some point
if those performance bottlenecks occur, sometimes selecting a Statement
with a SimpleSelector consisting of a subject URI, a predicate URI and a
null Object takes 20secs.
There should be not problems with blocking of threads as I run my
integration tests single threaded - especially if I am experiencing
failures.

I could confine this by using models without inferencing while
collecting and adding data spidered from the web, and especially adding
Ontologies last, only where absolutely needed. Also I use a whitelist
internally for domains my spider is allowed to fetch data from;
therefore I remove all owl:sameAs Statements containing object URIs not
in this whitelist. In the end, in my querying methods, I clone that
basic model with the collected data and add it to an InfModel:

protected static Model getInfModelFrom(Model model) {
 final long size = model.size();
 LOG.debug("getInfModelFrom: Input size: " +
Long.toString(size));
 final Model copy = ModelFactory.createDefaultModel();
 copy.add(model instanceof InfModel ? ((InfModel)
model).getRawModel() : model);
 final InfModel infModel =
ModelFactory.createInfModel(ReasonerRegistry.getOWLMiniReasoner(),
copy);
 return infModel;
 }

The only Ontology I am using is
http://d-nb.info/standards/elementset/gnd# .

I suppose that the Reasoner I use is much to mighty for the seemingly
simple owl:sameAs. Is there any more basic option understanding
owl:sameAs besides RDFS? All other OWL Axioms are not needed.
Are there any best practices dealing with Inferencing for relatively
small in memory models <10,000 Statements (most <5,000 Statements)? I
found some information on the web that a simple 'Equality Reasoner' is
in the works. Would that be a good choice? Will it be available any time
soon?

Thanks for any hints
Andreas