[jira] [Commented] (JENA-1277) Spatial Queries Very Slow For Large Databases

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822245#comment-15822245
 ] 

ASF GitHub Bot commented on JENA-1277:
--

Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/205
  
@osma In the case you mention, shouldn't the user be explicitly ordering 
anyway?


> Spatial Queries Very Slow For Large Databases
> -
>
> Key: JENA-1277
> URL: https://issues.apache.org/jira/browse/JENA-1277
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Spatial
>Affects Versions: Jena 3.1.1
> Environment: Linux Ubuntu
>Reporter: samur araujo
>Assignee: Osma Suominen
> Attachments: spatial-assembler.ttl
>
>
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: 
> PREFIX rdfs: 
> SELECT distinct ?place
> {
> ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
>   
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] jena issue #205: JENA-1277: don't use sorting in spatial queries, for much b...

2017-01-13 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/205
  
@osma In the case you mention, shouldn't the user be explicitly ordering 
anyway?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request #202: Formatting rampage

2017-01-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/202


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (JENA-1277) Spatial Queries Very Slow For Large Databases

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822027#comment-15822027
 ] 

ASF GitHub Bot commented on JENA-1277:
--

Github user osma commented on the issue:

https://github.com/apache/jena/pull/205
  
Thanks @afs for the approval. I will wait a few more days and then merge 
this, unless there are objections.

The only scenario I'm mildly concerned about is if someone performs a 
spatial query for a large area with a low limit parameter, so that the result 
list gets cut off. Currently the top K results will (or at least should) be 
from near the center of the area, but with the sorting disabled, the selection 
of the top K results will become random.

I have to wonder why the sorting is so slow though. I don't know much about 
spatial data processing, but I suppose that calculating the distances of ~2500 
coordinates to a given center point, and then sorting the list of 2500 results 
by those distances, shouldn't take 20 seconds. More like 20 milliseconds...


> Spatial Queries Very Slow For Large Databases
> -
>
> Key: JENA-1277
> URL: https://issues.apache.org/jira/browse/JENA-1277
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Spatial
>Affects Versions: Jena 3.1.1
> Environment: Linux Ubuntu
>Reporter: samur araujo
>Assignee: Osma Suominen
> Attachments: spatial-assembler.ttl
>
>
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: 
> PREFIX rdfs: 
> SELECT distinct ?place
> {
> ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
>   
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] jena issue #205: JENA-1277: don't use sorting in spatial queries, for much b...

2017-01-13 Thread osma
Github user osma commented on the issue:

https://github.com/apache/jena/pull/205
  
Thanks @afs for the approval. I will wait a few more days and then merge 
this, unless there are objections.

The only scenario I'm mildly concerned about is if someone performs a 
spatial query for a large area with a low limit parameter, so that the result 
list gets cut off. Currently the top K results will (or at least should) be 
from near the center of the area, but with the sorting disabled, the selection 
of the top K results will become random.

I have to wonder why the sorting is so slow though. I don't know much about 
spatial data processing, but I suppose that calculating the distances of ~2500 
coordinates to a given center point, and then sorting the list of 2500 results 
by those distances, shouldn't take 20 seconds. More like 20 milliseconds...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena issue #202: Formatting rampage

2017-01-13 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/202
  
@afs my laptop is on the fritz, so I will try to find a way to merge this 
later today, but you are welcome to! :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena issue #204: One writable graph per thread/transaction dataset

2017-01-13 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/204
  
I will retarget this at jena-extras. It does make more sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena issue #204: One writable graph per thread/transaction dataset

2017-01-13 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/204
  
That's a really fair point about providing some use cases. An example that 
occurs to me is RDF-based persistence where Java entities are being saved into 
a dataset. I think @claudenw has something like that going on in PA4RDF. This 
kind of dataset could act as a simple but powerful cache for that kind of 
system.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [GitHub] jena issue #204: One writable graph per thread/transaction dataset

2017-01-13 Thread Andy Seaborne

+1 to extras.

I think that has a lot of merit for small things to be in extras.  It is 
easier to point at and say "new - subject to change".


As does separate github repos - especiallty where the early status is 
unstable, where access while being developed for a the most intersted 
people is not affected by Jena release cycles.



One of the questions I'm trying to raise with this PR is exactly

> that-- is this useful only for LDP-type workloads (in which case maybe
> it belongs outside ARQ entirely) or not (in which case it has more
> claim to be in ARQ)?

How about describing some use case where you think it might be helpful?

Just having something in the codebase does not really ask the question 
you have about what might be - code tends to ask "does this exact thing 
do ..."


Andy

On 11/01/17 20:08, A. Soroka wrote:

Sure, that would be natural. Let me put the question this way: is a per-graph 
arrangement of this kind interesting to anyone who isn't interested in LDP?

The other direction here is forward with respect to locking. Claude and others 
(including me) have thrown around ideas on the list about how we could 
introduce more finely-grained locking for datasets, and I definitely think of 
this as a first tiny baby step in that direction.

---
A. Soroka
The University of Virginia Library


On Jan 11, 2017, at 2:58 PM, Claude Warren  wrote:

perhaps in extras?

On Wed, Jan 11, 2017 at 7:39 PM, ajs6f  wrote:


Github user ajs6f commented on the issue:

   https://github.com/apache/jena/pull/204

   `pergraph`: Just thought that `core` was getting awfully crowded. I
don't care one way or the other-- happy to put them anywhere.

   `jena-ldp`: One of the questions I'm trying to raise with this PR is
exactly that-- is this useful only for LDP-type workloads (in which case
maybe it belongs outside ARQ entirely) or not (in which case it has more
claim to be in ARQ)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---





--
I like: Like Like - The likeliest place on the web

LinkedIn: http://www.linkedin.com/in/claudewarren




[jira] [Resolved] (JENA-1278) BulkLoader does not restore indexes if no items are loaded.

2017-01-13 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-1278.
-
   Resolution: Fixed
 Assignee: Andy Seaborne
Fix Version/s: Jena 3.2.0

> BulkLoader does not restore indexes if no items are loaded.
> ---
>
> Key: JENA-1278
> URL: https://issues.apache.org/jira/browse/JENA-1278
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
> Fix For: Jena 3.2.0
>
>
> The BulkLoader does not restore indexes if no items are loaded. Normally, 
> this does not matter (the main use is from tdbloader and the JVM will exit). 
> However, if called from an application, it might matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JENA-1279) Update jsonld-java version

2017-01-13 Thread Andy Seaborne (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-1279.
-
   Resolution: Fixed
 Assignee: Andy Seaborne
Fix Version/s: Jena 3.2.0

https://github.com/apache/jena/commit/40402b7ccfd63365d92ddce4f04a13a7cfec4ce7

> Update jsonld-java version
> --
>
> Key: JENA-1279
> URL: https://issues.apache.org/jira/browse/JENA-1279
> Project: Apache Jena
>  Issue Type: Task
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
> Fix For: Jena 3.2.0
>
>
> v0.9.0 has been released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JENA-1279) Update jsonld-java version

2017-01-13 Thread Andy Seaborne (JIRA)
Andy Seaborne created JENA-1279:
---

 Summary: Update jsonld-java version
 Key: JENA-1279
 URL: https://issues.apache.org/jira/browse/JENA-1279
 Project: Apache Jena
  Issue Type: Task
Reporter: Andy Seaborne


v0.9.0 has been released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-1278) BulkLoader does not restore indexes if no items are loaded.

2017-01-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821866#comment-15821866
 ] 

ASF subversion and git services commented on JENA-1278:
---

Commit 96df95e2f06606af3d27243e50a978cca99db53a in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=96df95e ]

JENA-1278: Always attach the secondary indexes


> BulkLoader does not restore indexes if no items are loaded.
> ---
>
> Key: JENA-1278
> URL: https://issues.apache.org/jira/browse/JENA-1278
> Project: Apache Jena
>  Issue Type: Bug
>  Components: TDB
>Reporter: Andy Seaborne
>
> The BulkLoader does not restore indexes if no items are loaded. Normally, 
> this does not matter (the main use is from tdbloader and the JVM will exit). 
> However, if called from an application, it might matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JENA-1278) BulkLoader does not restore indexes if no items are loaded.

2017-01-13 Thread Andy Seaborne (JIRA)
Andy Seaborne created JENA-1278:
---

 Summary: BulkLoader does not restore indexes if no items are 
loaded.
 Key: JENA-1278
 URL: https://issues.apache.org/jira/browse/JENA-1278
 Project: Apache Jena
  Issue Type: Bug
  Components: TDB
Reporter: Andy Seaborne


The BulkLoader does not restore indexes if no items are loaded. Normally, this 
does not matter (the main use is from tdbloader and the JVM will exit). 
However, if called from an application, it might matter.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (JENA-1277) Spatial Queries Very Slow For Large Databases

2017-01-13 Thread Osma Suominen (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Osma Suominen reassigned JENA-1277:
---

Assignee: Osma Suominen

> Spatial Queries Very Slow For Large Databases
> -
>
> Key: JENA-1277
> URL: https://issues.apache.org/jira/browse/JENA-1277
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Spatial
>Affects Versions: Jena 3.1.1
> Environment: Linux Ubuntu
>Reporter: samur araujo
>Assignee: Osma Suominen
> Attachments: spatial-assembler.ttl
>
>
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: 
> PREFIX rdfs: 
> SELECT distinct ?place
> {
> ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
>   
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] jena pull request #205: JENA-1277: don't use sorting in spatial queries, for...

2017-01-13 Thread osma
GitHub user osma opened a pull request:

https://github.com/apache/jena/pull/205

JENA-1277: don't use sorting in spatial queries, for much better performance

This PR proposes removing the `distSort` parameter from the Lucene spatial 
query performed by jena-spatial. Dropping the sorting gives a massive 
performance boost; in the Geonames example given in JENA-1277, the query time 
drops from over 20 seconds to less than 200 ms.

I suppose that the sorting is not necessary since jena-spatial results are 
just raw material for the SPARQL engine anyway.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/osma/jena jena-spatial-no-sort

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/205.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #205


commit 864e2ce831f41a15ba683d0edd29cbe3236ff636
Author: Osma Suominen 
Date:   2017-01-13T12:29:16Z

JENA-1277: don't use sorting in spatial queries, for much better performance




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (JENA-1277) Spatial Queries Very Slow For Large Databases

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821721#comment-15821721
 ] 

ASF GitHub Bot commented on JENA-1277:
--

GitHub user osma opened a pull request:

https://github.com/apache/jena/pull/205

JENA-1277: don't use sorting in spatial queries, for much better performance

This PR proposes removing the `distSort` parameter from the Lucene spatial 
query performed by jena-spatial. Dropping the sorting gives a massive 
performance boost; in the Geonames example given in JENA-1277, the query time 
drops from over 20 seconds to less than 200 ms.

I suppose that the sorting is not necessary since jena-spatial results are 
just raw material for the SPARQL engine anyway.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/osma/jena jena-spatial-no-sort

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/205.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #205


commit 864e2ce831f41a15ba683d0edd29cbe3236ff636
Author: Osma Suominen 
Date:   2017-01-13T12:29:16Z

JENA-1277: don't use sorting in spatial queries, for much better performance




> Spatial Queries Very Slow For Large Databases
> -
>
> Key: JENA-1277
> URL: https://issues.apache.org/jira/browse/JENA-1277
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Spatial
>Affects Versions: Jena 3.1.1
> Environment: Linux Ubuntu
>Reporter: samur araujo
> Attachments: spatial-assembler.ttl
>
>
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: 
> PREFIX rdfs: 
> SELECT distinct ?place
> {
> ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
>   
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-1277) Spatial Queries Very Slow For Large Databases

2017-01-13 Thread Osma Suominen (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821712#comment-15821712
 ] 

Osma Suominen edited comment on JENA-1277 at 1/13/17 12:28 PM:
---

I dropped the {{distSort}} parameter from the method call, and the query 
response time was reduced to below 200 ms!

I suppose the sort order of Lucene spatial results shouldn't matter in 
jena-spatial? After all, they are just raw material for the SPARQL engine and 
the real order is specified (or not) using ORDER BY.


was (Author: osma):
I dropped the {{distSort parameter}} from the method call, and the query 
response time was reduced to below 200 ms!

I suppose the sort order of Lucene spatial results shouldn't matter in 
jena-spatial? After all, they are just raw material for the SPARQL engine and 
the real order is specified (or not) using ORDER BY.

> Spatial Queries Very Slow For Large Databases
> -
>
> Key: JENA-1277
> URL: https://issues.apache.org/jira/browse/JENA-1277
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Spatial
>Affects Versions: Jena 3.1.1
> Environment: Linux Ubuntu
>Reporter: samur araujo
> Attachments: spatial-assembler.ttl
>
>
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: 
> PREFIX rdfs: 
> SELECT distinct ?place
> {
> ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
>   
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-1277) Spatial Queries Very Slow For Large Databases

2017-01-13 Thread Osma Suominen (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821712#comment-15821712
 ] 

Osma Suominen commented on JENA-1277:
-

I dropped the {{distSort parameter}} from the method call, and the query 
response time was reduced to below 200 ms!

I suppose the sort order of Lucene spatial results shouldn't matter in 
jena-spatial? After all, they are just raw material for the SPARQL engine and 
the real order is specified (or not) using ORDER BY.

> Spatial Queries Very Slow For Large Databases
> -
>
> Key: JENA-1277
> URL: https://issues.apache.org/jira/browse/JENA-1277
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Spatial
>Affects Versions: Jena 3.1.1
> Environment: Linux Ubuntu
>Reporter: samur araujo
> Attachments: spatial-assembler.ttl
>
>
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: 
> PREFIX rdfs: 
> SELECT distinct ?place
> {
> ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
>   
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-1277) Spatial Queries Very Slow For Large Databases

2017-01-13 Thread Osma Suominen (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821705#comment-15821705
 ] 

Osma Suominen commented on JENA-1277:
-

Did some basic profiling using current Jena (Fuseki 1.5.0-SNAPSHOT). The slow 
part which takes ~20s is this line in SpatialIndexLucene.java, i.e. the point 
where the actual Lucene search is done:

{noformat}
 TopDocs docs = indexSearcher.search(new MatchAllDocsQuery(), filter,
limit, distSort);
{noformat}

Of the parameters, 
* {{filter}} is a (Lucene)  {{IntersectsPrefixTreeFilter}}
* {{limit}} is 1 (the default)
* {{distSort}} is 
{{}},
 whatever that means.

I'm not sure how fast this operation should be on such a large data set. I've 
never used jena-spatial or Lucene spatial queries before...

> Spatial Queries Very Slow For Large Databases
> -
>
> Key: JENA-1277
> URL: https://issues.apache.org/jira/browse/JENA-1277
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Spatial
>Affects Versions: Jena 3.1.1
> Environment: Linux Ubuntu
>Reporter: samur araujo
> Attachments: spatial-assembler.ttl
>
>
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: 
> PREFIX rdfs: 
> SELECT distinct ?place
> {
> ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
>   
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-1276) "loading remote context failed" reading JSON-LD

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821701#comment-15821701
 ] 

ASF GitHub Bot commented on JENA-1276:
--

Github user fpservant commented on the issue:

https://github.com/apache/jena/pull/203
  
@afs thanks. JENA-1276 now closed


> "loading remote context failed" reading JSON-LD
> ---
>
> Key: JENA-1276
> URL: https://issues.apache.org/jira/browse/JENA-1276
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.1.1
> Environment: mac osx 10.11, eclipse mars
>Reporter: François-Paul Servant
> Fix For: Jena 3.2.0
>
>
> getting org.apache.jena.riot.RiotException: loading remote context failed: 
> http://schema.org/ reading json-ld with an @context set to schema.org
> see https://github.com/apache/jena/pull/203
> the exception occurs here:
> com.github.jsonldjava.core.DocumentLoader.loadDocument(String) line 29
> where I get:
> org.apache.http.ConnectionClosedException: Premature end of Content-Length 
> delimited message body (expected: 124346; received: 0
> {noformat}
> public class TestJsonLDReader {
> @Test public final void simpleReadTest() {
> try {
> String jsonld = someSchemaDorOrgJsonld();
> Model m = ModelFactory.createDefaultModel();
> try (StringReader reader = new StringReader(jsonld)) {
> m.read(reader, null, "JSON-LD");  
> }
> assertJohnDoeIsOK(m);
> } catch (RiotException e) {
> // org.apache.jena.riot.RiotException: loading remote context 
> failed: http://schema.org/
> e.printStackTrace();
> }
> }
> /** Example data */
> private String someSchemaDorOrgJsonld() {
> return "{\"@id\":\"_:b0\",\"@type\":\"Person\",\"name\":\"John 
> Doe\",\"@context\":\"http://schema.org/\"}";;
> }
> /** Checking that the data loaded from someSchemaDorOrgJsonld into a 
> model, is OK */
> private void assertJohnDoeIsOK(Model m) {
> assertTrue(m.contains(null, RDF.type, 
> m.createResource("http://schema.org/Person";)));
> assertTrue(m.contains(null, 
> m.createProperty("http://schema.org/name";), "John Doe"));   
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] jena issue #203: JsonLDReader: possibility to override the @context

2017-01-13 Thread fpservant
Github user fpservant commented on the issue:

https://github.com/apache/jena/pull/203
  
@afs thanks. JENA-1276 now closed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (JENA-1276) "loading remote context failed" reading JSON-LD

2017-01-13 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/JENA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

François-Paul Servant closed JENA-1276.
---
   Resolution: Fixed
Fix Version/s: Jena 3.2.0

solved by upgrade of json-ld java from 0.8.3 to 0.9.0

> "loading remote context failed" reading JSON-LD
> ---
>
> Key: JENA-1276
> URL: https://issues.apache.org/jira/browse/JENA-1276
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.1.1
> Environment: mac osx 10.11, eclipse mars
>Reporter: François-Paul Servant
> Fix For: Jena 3.2.0
>
>
> getting org.apache.jena.riot.RiotException: loading remote context failed: 
> http://schema.org/ reading json-ld with an @context set to schema.org
> see https://github.com/apache/jena/pull/203
> the exception occurs here:
> com.github.jsonldjava.core.DocumentLoader.loadDocument(String) line 29
> where I get:
> org.apache.http.ConnectionClosedException: Premature end of Content-Length 
> delimited message body (expected: 124346; received: 0
> {noformat}
> public class TestJsonLDReader {
> @Test public final void simpleReadTest() {
> try {
> String jsonld = someSchemaDorOrgJsonld();
> Model m = ModelFactory.createDefaultModel();
> try (StringReader reader = new StringReader(jsonld)) {
> m.read(reader, null, "JSON-LD");  
> }
> assertJohnDoeIsOK(m);
> } catch (RiotException e) {
> // org.apache.jena.riot.RiotException: loading remote context 
> failed: http://schema.org/
> e.printStackTrace();
> }
> }
> /** Example data */
> private String someSchemaDorOrgJsonld() {
> return "{\"@id\":\"_:b0\",\"@type\":\"Person\",\"name\":\"John 
> Doe\",\"@context\":\"http://schema.org/\"}";;
> }
> /** Checking that the data loaded from someSchemaDorOrgJsonld into a 
> model, is OK */
> private void assertJohnDoeIsOK(Model m) {
> assertTrue(m.contains(null, RDF.type, 
> m.createResource("http://schema.org/Person";)));
> assertTrue(m.contains(null, 
> m.createProperty("http://schema.org/name";), "John Doe"));   
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JENA-1277) Spatial Queries Very Slow For Large Databases

2017-01-13 Thread Osma Suominen (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Osma Suominen updated JENA-1277:

Attachment: spatial-assembler.ttl

assembler configuration, including spatial index and Fuseki services

> Spatial Queries Very Slow For Large Databases
> -
>
> Key: JENA-1277
> URL: https://issues.apache.org/jira/browse/JENA-1277
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Spatial
>Affects Versions: Jena 3.1.1
> Environment: Linux Ubuntu
>Reporter: samur araujo
> Attachments: spatial-assembler.ttl
>
>
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: 
> PREFIX rdfs: 
> SELECT distinct ?place
> {
> ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
>   
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-1277) Spatial Queries Very Slow For Large Databases

2017-01-13 Thread Osma Suominen (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821669#comment-15821669
 ] 

Osma Suominen commented on JENA-1277:
-

I tried reproducing this. It takes a while though since the data set is rather 
large. I will attach the assembler/fuseki configuration file that I used.

First I created the TDB using tdbloader2:

{noformat}
tdbloader2 --loc tdb geonames.nt.gz
{noformat}

This took 69 minutes on my i3-2330M Ubuntu 16.04 laptop with SSD.

Then I created the spatial index. I had to experiment a bit until I found out 
the amount of memory. Luckily 6G was enough, since I'm on a 8G machine so I 
couldn't have afforded much more:

{noformat}
java -Xmx6G -cp fuseki-server.jar jena.spatialindexer 
--desc=spatial-assembler.ttl
{noformat}

This took 19 minutes.

Finally I ran Fuseki 1.4.1. I tweaked fuseki-server startup script beforehand 
to give it 4G of memory, just in case.

{noformat}
./fuseki-server --config spatial-assembler.ttl
{noformat}

Finally I executed the query:
{noformat}
s-query --service=http://localhost:3030/ds/sparql --query query.rq --output=csv 
>results.csv
{noformat}

I ran this a few times and the response time varied between 18 and 27 seconds. 
I got 2469 results, not 17 as you said on the mailing list. I suspect that your 
spatial index is somehow incomplete, since you got fewer results in a shorter 
time.

In any case, I can confirm that this spatial query is really slow.

> Spatial Queries Very Slow For Large Databases
> -
>
> Key: JENA-1277
> URL: https://issues.apache.org/jira/browse/JENA-1277
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Spatial
>Affects Versions: Jena 3.1.1
> Environment: Linux Ubuntu
>Reporter: samur araujo
>
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: 
> PREFIX rdfs: 
> SELECT distinct ?place
> {
> ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
>   
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Fwd: web access of the questions and answers

2017-01-13 Thread Osma Suominen

+1 to everything Andy said below.

I'm positively amazed by the patience and (relative) politeness of 
especially Lorenz on jena-user, and at the same time appalled by the 
quality of some of the questions.


I think that sometimes just linking to 
https://jena.apache.org/help_and_support/#how-to-ask-a-good-question 
would be enough, but of course, a polite and helpful answer is always 
better.


-Osma



13.01.2017, 12:50, Andy Seaborne wrote:

The levels are higher recently and at the upper end of what is normal.

http://mail-archives.apache.org/mod_mbox/jena-users/

It is clear a few individuals are using the list before making even a
small effort locally.

I applaud the effort people are making to answer their questions.

We should encourage a

Complete, Minimal, Example

Just the act of making one, can help the OP.

Splitting the lists is highly unlikely to make a difference.

Andy

 Forwarded Message 
Subject: web access of the questions and answers
Date: Fri, 13 Jan 2017 11:06:20 +0100
From: Sandor Kopacsi 
Reply-To: us...@jena.apache.org
To: us...@jena.apache.org
CC: Osma Suominen 

Dear List-owners,

Due to the high number of the mails arriving to this list, I kindly ask
if there is a possibility to access the conversations through the web,
like at the Google group of Skosmos Users, where I can check only those
questions and answers, that are relevant to me.

Thanks in advance.

Best Regards,
Sandor




--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suomi...@helsinki.fi
http://www.nationallibrary.fi


Fwd: web access of the questions and answers

2017-01-13 Thread Andy Seaborne

The levels are higher recently and at the upper end of what is normal.

http://mail-archives.apache.org/mod_mbox/jena-users/

It is clear a few individuals are using the list before making even a 
small effort locally.


I applaud the effort people are making to answer their questions.

We should encourage a

Complete, Minimal, Example

Just the act of making one, can help the OP.

Splitting the lists is highly unlikely to make a difference.

Andy

 Forwarded Message 
Subject: web access of the questions and answers
Date: Fri, 13 Jan 2017 11:06:20 +0100
From: Sandor Kopacsi 
Reply-To: us...@jena.apache.org
To: us...@jena.apache.org
CC: Osma Suominen 

Dear List-owners,

Due to the high number of the mails arriving to this list, I kindly ask 
if there is a possibility to access the conversations through the web, 
like at the Google group of Skosmos Users, where I can check only those 
questions and answers, that are relevant to me.


Thanks in advance.

Best Regards,
Sandor

--
Dr. Sandor Kopacsi
IT Software Designer

Vienna University Computer Center



[GitHub] jena issue #203: JsonLDReader: possibility to override the @context

2017-01-13 Thread afs
Github user afs commented on the issue:

https://github.com/apache/jena/pull/203
  
I've done the upgrade to 0.9.0 on master for you. It's slightly easier to 
separate upgrades from other changes because they need noting in the release 
cycle.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (JENA-1276) "loading remote context failed" reading JSON-LD

2017-01-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/JENA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821524#comment-15821524
 ] 

François-Paul Servant commented on JENA-1276:
-

jsonld-java 0.9.0 solves the issue

> "loading remote context failed" reading JSON-LD
> ---
>
> Key: JENA-1276
> URL: https://issues.apache.org/jira/browse/JENA-1276
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.1.1
> Environment: mac osx 10.11, eclipse mars
>Reporter: François-Paul Servant
>
> getting org.apache.jena.riot.RiotException: loading remote context failed: 
> http://schema.org/ reading json-ld with an @context set to schema.org
> see https://github.com/apache/jena/pull/203
> the exception occurs here:
> com.github.jsonldjava.core.DocumentLoader.loadDocument(String) line 29
> where I get:
> org.apache.http.ConnectionClosedException: Premature end of Content-Length 
> delimited message body (expected: 124346; received: 0
> {noformat}
> public class TestJsonLDReader {
> @Test public final void simpleReadTest() {
> try {
> String jsonld = someSchemaDorOrgJsonld();
> Model m = ModelFactory.createDefaultModel();
> try (StringReader reader = new StringReader(jsonld)) {
> m.read(reader, null, "JSON-LD");  
> }
> assertJohnDoeIsOK(m);
> } catch (RiotException e) {
> // org.apache.jena.riot.RiotException: loading remote context 
> failed: http://schema.org/
> e.printStackTrace();
> }
> }
> /** Example data */
> private String someSchemaDorOrgJsonld() {
> return "{\"@id\":\"_:b0\",\"@type\":\"Person\",\"name\":\"John 
> Doe\",\"@context\":\"http://schema.org/\"}";;
> }
> /** Checking that the data loaded from someSchemaDorOrgJsonld into a 
> model, is OK */
> private void assertJohnDoeIsOK(Model m) {
> assertTrue(m.contains(null, RDF.type, 
> m.createResource("http://schema.org/Person";)));
> assertTrue(m.contains(null, 
> m.createProperty("http://schema.org/name";), "John Doe"));   
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)