Re: jena-project

2017-11-20 Thread Rob Vesse
+1

On 17/11/2017, 15:16, "ajs6f"  wrote:

I'm basically +1 to this-- jena-project was always confusing at best.

In theory, we could factor out some of those 932 lines with a Jena Maven 
BOM. Actually, that might be nice for integrators and those using 
apache-jena-lib.

ajs6f

> On Nov 17, 2017, at 10:12 AM, Andy Seaborne  wrote:
> 
> When we moved to one version for all modules, pressure of time pushed us 
to have jena-project as a copy of the old jena-parent.
> 
> Do we want to go the next step forward which is to merge jena-project 
into the top POM and drop the jena-project module?
> 
> It turns out to be quite easy to do.
> 
> PR for discussion:
>  https://github.com/apache/jena/pull/312
> 
> It does make the top POM quite large - 932 lines.
> 
> Thoughts?
> 
>Andy








[jira] [Commented] (JENA-1429) Error with # comments in SPARQL

2017-11-20 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259056#comment-16259056
 ] 

Andy Seaborne commented on JENA-1429:
-

You have to encode
{{#}} marks the URI fragment. A SPARQL query involving a {{#}} must encode it 
as {{%23}} in the HTTP query string otherwise it is a URL fragment. URI 
Fragments are not set by the client in HTTP.


> Error with # comments in SPARQL
> ---
>
> Key: JENA-1429
> URL: https://issues.apache.org/jira/browse/JENA-1429
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Fuseki
> Environment:  Fuseki 3.4.0 
>Reporter: Karima Rafes
>Priority: Trivial
>
> A comment in SPARQL queries take the form of '#', outside an IRI or string, 
> and continue to the end of line[1] but Fuseki sends a parse error (Fuseki 
> 3.4.0 (Build date: 2017-07-17T11:43:07+)).
> [1] https://www.w3.org/TR/rdf-sparql-query/#grammarComments



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (JENA-1429) Error with # comments in SPARQL

2017-11-20 Thread Rob Vesse (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse resolved JENA-1429.
-
Resolution: Invalid

> Error with # comments in SPARQL
> ---
>
> Key: JENA-1429
> URL: https://issues.apache.org/jira/browse/JENA-1429
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Fuseki
> Environment:  Fuseki 3.4.0 
>Reporter: Karima Rafes
>Priority: Trivial
>
> A comment in SPARQL queries take the form of '#', outside an IRI or string, 
> and continue to the end of line[1] but Fuseki sends a parse error (Fuseki 
> 3.4.0 (Build date: 2017-07-17T11:43:07+)).
> [1] https://www.w3.org/TR/rdf-sparql-query/#grammarComments



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (JENA-1429) Error with # comments in SPARQL

2017-11-20 Thread Rob Vesse (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse reopened JENA-1429:
-

> Error with # comments in SPARQL
> ---
>
> Key: JENA-1429
> URL: https://issues.apache.org/jira/browse/JENA-1429
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Fuseki
> Environment:  Fuseki 3.4.0 
>Reporter: Karima Rafes
>Priority: Trivial
>
> A comment in SPARQL queries take the form of '#', outside an IRI or string, 
> and continue to the end of line[1] but Fuseki sends a parse error (Fuseki 
> 3.4.0 (Build date: 2017-07-17T11:43:07+)).
> [1] https://www.w3.org/TR/rdf-sparql-query/#grammarComments



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] jena pull request #313: JENA-1430: Read quads for ja:data by filename

2017-11-20 Thread afs
GitHub user afs opened a pull request:

https://github.com/apache/jena/pull/313

JENA-1430: Read quads for ja:data by filename

Add support for {{ja:data "filename"}}.

Strings as filenames and URIs as file names resolve different (the JVM cwd 
and the assembler file location respectively) which can be confusing for 
filenames.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/afs/jena assembler-quads

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #313






---


[jira] [Commented] (JENA-1430) Quad loading for in-memory assemblers

2017-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259107#comment-16259107
 ] 

ASF GitHub Bot commented on JENA-1430:
--

GitHub user afs opened a pull request:

https://github.com/apache/jena/pull/313

JENA-1430: Read quads for ja:data by filename

Add support for {{ja:data "filename"}}.

Strings as filenames and URIs as file names resolve different (the JVM cwd 
and the assembler file location respectively) which can be confusing for 
filenames.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/afs/jena assembler-quads

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #313






> Quad loading for in-memory assemblers
> -
>
> Key: JENA-1430
> URL: https://issues.apache.org/jira/browse/JENA-1430
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Reporter: A. Soroka
>Assignee: A. Soroka
>
> In-memory dataset Assemblers should support loading quad files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Issues fixed in Apache Jena

2017-11-20 Thread Γεώργιος Δίγκας
Dear Andy and All,

Thank you very much for the information that you have provided to me. You 
really helped me a lot.
Below I list some of the most frequent issue types that I have found while I 
was examining the evolution of Jena.

Issue Name  Issues over timeFixed   Currently Open
The members of an interface declaration or class should appear in a pre-defined 
order   52875   47530   5345
Sections of code should not be "commented out"  35465   31905   3560
Method names should comply with a naming convention 32809   29448   3361
Constant names should comply with a naming convention   18397   16572   1825
String literals should not be 
duplicated 18106 
  16390   1716
Standard outputs should not be used directly to log 
anything  
16174   14506   1668
Exception handlers should preserve the original 
exceptions   
10763   97471016
Source files should not have any duplicated blocks  914180781063
Methods should 
not be empty 
  86537796857
switch case clauses should not have too many lines of 
code   8280  
  7491789
throws declarations should not be 
superfluous
  70126138874
Class variable fields should not have public 
accessibility
65935954639


  1.  The issue that appears the most frequently is: The members of an 
interface declaration or class should appear in a pre-defined order. "According 
to the Java Code Conventions as defined by Oracle, the members of a class or 
interface declaration should appear in the following order in the source files: 
1. Class and instance variables, 2.Constructors, and 3. Methods."
  2.  The second most frequent is: Sections of code should not be "commented 
out". "Programmers should not comment out code as it bloats programs and 
reduces readability.  Unused code should be deleted and can be retrieved from 
source control history if required."
  3.  Method names should comply with a naming convention. "Shared naming 
conventions allow teams to collaborate efficiently. This rule checks that all 
method names match the default provided regular expression ^[a-z][a-zA-Z0-9]*$"
  4.  Constant names should comply with a naming convention. "Shared coding 
conventions allow teams to collaborate efficiently. This rule checks that all 
constant names match the default regular expression 
^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$"
  5.  String literals should not be duplicated. "Duplicated string literals 
make the process of refactoring error-prone, since you must be sure to update 
all occurrences.  On the other hand, constants can be referenced from many 
places, but only need to be updated in a single place."
  6.  Standard outputs should not be used directly to log anything. "When 
logging a message there are several important requirements which must be 
fulfilled:
* The user must be able to easily retrieve the logs
* The format of all logged message must be uniform to allow the user to easily 
read the log
* Logged data must actually be recorded
* Sensitive data must only be logged securely
If a program directly writes to the standard outputs, there is absolutely no 
way to comply with those requirements. That's why defining and using a 
dedicated logger is highly recommended."
  7.  Exception handlers should preserve the original exceptions. "When 
handling a caught exception, the original exception's message and stack trace 
should be logged or passed forward."
  8.  Source files should not have any duplicated blocks. "An issue is created 
on a file as soon as there is at least one block of duplicated code on this 
file"
  9.  Methods should not be empty. "There are several reasons for a method not 
to have a method body:
* It is an unintentional omission, and should be fixed to prevent an unexpected 
behavior in production.
* It is not yet, or never will be, supported. In this case an 
UnsupportedOperationException should be thrown.
* The method is an intentionally-blank override. In this case a nested comment 
should explain the reason for the blank override."
  10. switch case clauses should not have too many lines of code. "The switch 
statement should be used only to clearly define some new branches in the 
control flow. As soon as a case clause contains too many statements this highly 
decreases the readability of the overall control flow statement. In such case, 
the content of the case clause should be extracted into a dedicated method."
  11.
"throws

Re: Issues fixed in Apache Jena

2017-11-20 Thread Rob Vesse
Comments inline:

On 20/11/2017, 14:47, "Γεώργιος Δίγκας"  wrote:

Dear Andy and All,

Thank you very much for the information that you have provided to me. You 
really helped me a lot.
Below I list some of the most frequent issue types that I have found while 
I was examining the evolution of Jena.

Issue Name  Issues over timeFixed   Currently Open
The members of an interface declaration or class should appear in a 
pre-defined order   52875   47530   5345
Sections of code should not be "commented out"  35465   31905   3560
Method names should comply with a naming convention 32809   29448   3361
Constant names should comply with a naming convention   18397   16572   1825
String literals should not be 
duplicated 18106 
  16390   1716
Standard outputs should not be used directly to log 
anything  
16174   14506   1668
Exception handlers should preserve the original 
exceptions   
10763   97471016
Source files should not have any duplicated blocks  914180781063
Methods 
should not be 
empty   8653 
   7796857
switch case clauses should not have too many lines of 
code   8280  
  7491789
throws declarations should not be 
superfluous
  70126138874
Class variable fields should not have public 
accessibility
65935954639


1. The issue that appears the most frequently is: The members of an interface 
declaration or class should appear in a pre-defined order. "According to the 
Java Code Conventions as defined by Oracle, the members of a class or interface 
declaration should appear in the following order in the source files: 1. Class 
and instance variables, 2.Constructors, and 3. Methods."

 This is pretty nitpicky, I have rarely seen this enforced because quite 
frankly it is annoying to enforce and with a modern IDE is somewhat irrelevant 
anyway

2. The second most frequent is: Sections of code should not be "commented out". 
"Programmers should not comment out code as it bloats programs and reduces 
readability.  Unused code should be deleted and can be retrieved from source 
control history if required."

For projects with a long history, Jena started in 2001, this is pretty common. 
Often a programmer will leave a section of code commented out because they 
rewrote that section and can’t verify that they didn’t break something and may 
want to easily go back to the old version should bugs be reported.   Also 
sometimes commented out code is used to illustrate a naïve/simple 
implementation that makes it easier for programmers to understand the intent of 
the code but then the actual implementation may reflect a more performant, but 
less understandable, implementation.

  3.  Method names should comply with a naming convention. "Shared naming 
conventions allow teams to collaborate efficiently. This rule checks that all 
method names match the default provided regular expression ^[a-z][a-zA-Z0-9]*$"
  4.  Constant names should comply with a naming convention. "Shared coding 
conventions allow teams to collaborate efficiently. This rule checks that all 
constant names match the default regular expression 
^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$"

These are both personal preference type things. Some projects enforce this 
while others do not, again with a history as long as ours enforcing this is 
undesirable.

 I know that you are using the default here. If you are using the same approach 
to analyse other projects you may be wanting to customise these rules to match 
the actual conventions that a given project uses.

  5.  String literals should not be duplicated. "Duplicated string literals 
make the process of refactoring error-prone, since you must be sure to update 
all occurrences.  On the other hand, constants can be referenced from many 
places, but only need to be updated in a single place."

 I suspect that some of this comes from the fact that Jena consists of many 
modules and not every module depends on every other module so there will 
inevitably be some duplication across modules. Also string literals are used 
heavily in test cases where there will be plenty of intentional duplication 
since each set of tests should be independent of each other.

  6.  Standard outputs should not be used directly to log anything. "When 
logging a message there are several important requirements which 

Re: CMS diff: Jena Full Text Search

2017-11-20 Thread ajs6f
I went to review this diff and rediscovered (to my chagrin) that I really know 
very little about Jena's text indexing.

Osma (or anyone else who knows text indexing better than do I, which wouldn't 
take much)-- could you review this? It's got some great useful detail about how 
the indexing works and can be used.

ajs6f

> On Nov 20, 2017, at 1:51 AM, Chris Tomlinson  wrote:
> 
> Clone URL (Committers only):
> https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Fquery%2Ftext-query.mdtext
> 
> Chris Tomlinson
> 
> Index: trunk/content/documentation/query/text-query.mdtext
> ===
> --- trunk/content/documentation/query/text-query.mdtext   (revision 
> 1815762)
> +++ trunk/content/documentation/query/text-query.mdtext   (working copy)
> @@ -1,5 +1,7 @@
> Title: Jena Full Text Search
> 
> +Title: Jena Full Text Search
> +
> This extension to ARQ combines SPARQL and full text search via
> [Lucene](https://lucene.apache.org) 6.4.1 or
> [ElasticSearch](https://www.elastic.co) 5.2.1 (which is built on
> @@ -64,7 +66,20 @@
> ## Table of Contents
> 
> -   [Architecture](#architecture)
> +-   [External content](#external-content)
> +-   [External applications](#external-applications)
> +-   [Document structure](#document-structure)
> -   [Query with SPARQL](#query-with-sparql)
> +-   [Syntax](#syntax)
> +-   [Input arguments](#input-arguments)
> +-   [Output arguments](#output-arguments)
> +-   [Query strings](#query-strings)
> +-   [Simple queries](#simple-queries)
> +-   [Queries with language tags](#queries-with-language-tags)
> +-   [Queries that retrieve literals](#queries-that-retrieve-literals)
> +-   [Queries across multiple 
> `Field`s](#queries-across-multiple-fields)
> +-   [Queries within a `Field`](#queries-within-a-field)
> +-   [Good practice](#good-practice)
> -   [Configuration](#configuration)
> -   [Text Dataset Assembler](#text-dataset-assembler)
> -   [Configuring an analyzer](#configuring-an-analyzer)
> @@ -134,6 +149,69 @@
> By using Elasticsearch, other applications can share the text index with
> SPARQL search.
> 
> +### Document structure
> +
> +As mentioned above, text indexing of a triple involves associating a Lucene
> +document with the triple. How is this done?
> +
> +Lucene documents are composed of `Field`s. Indexing and searching are 
> performed 
> +over the contents of these `Field`s. For an RDF triple to be indexed in 
> Lucene the 
> +_property_ of the triple must be 
> +[configured in the entity map of a TextIndex](#entity-map-definition).
> +This associates a Lucene analyzer with the _`property`_ which will be used
> +for indexing and search. The _`property`_ becomes the _searchable_ Lucene 
> +`Field` in the resulting document.
> +
> +A Lucene index includes a _default_ `Field`, which is specified in the 
> configuration, 
> +that is the field to search if not otherwise named in the query. In 
> jena-text 
> +this field is configured via the `text:defaultField` property which is then 
> mapped 
> +to a specific RDF property via `text:predicate` (see [entity 
> map](#entity-map-definition) 
> +below).
> +
> +There are several additional `Field`s that will be included in the
> +document that is passed to the Lucene `IndexWriter` depending on the
> +configuration options that are used. These additional fields are used to
> +manage the interface between Jena and Lucene and are not generally 
> +searchable per se.
> +
> +The most important of these additional `Field`s is the `text:entityField`.
> +This configuration property defines the name of the `Field` that will contain
> +the _URI_ or _blank node id_ of the _subject_ of the triple being indexed. 
> This property does
> +not have a default and must be specified for most uses of `jena-text`. This
> +`Field` is often given the name, `uri`, in examples. It is via this `Field`
> +that `?s` is bound in a typical use such as:
> +
> +select ?s
> +where {
> +?s text:query "some text"
> +}
> +
> +Other `Field`s that may be configured: `text:uidField`, `text:graphField`,
> +and so on are discussed below.
> +
> +Given the triple:
> +
> +ex:SomeOne skos:prefLabel "zorn protégé a prés"@fr ;
> +
> +The following illustrates a Lucene document that Jena will create and
> +request Lucene to index:
> +
> +Document<
> +stored, indexed, indexOptions=DOCS http://example.org/SomeOne> 
> +indexed, omitNorms, indexOptions=DOCS 
>  
> +stored, indexed, tokenized  
> +stored, indexed, omitNorms, indexOptions=DOCS  
> +stored, indexed, tokenized  
> +stored, indexed, omitNorms, indexOptions=DOCS 
>  
> +stored, indexed, tokenized  
> +stored, indexed, omitNorms, indexOptions=DOCS  
> +stored, indexed, tokenized  
> +stored, indexed, omitNorms,

[jira] [Commented] (JENA-1430) Quad loading for in-memory assemblers

2017-11-20 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259478#comment-16259478
 ] 

Andy Seaborne commented on JENA-1430:
-

The document 

   https://jena.apache.org/documentation/rdf/datasets.html

says that the type is {{ja:MemoryDataset}} for TIM but the code disagrees where 
it is {{ja:DatasetTxnMem}}).

I prefer the documentation - nowadays, the general dataset is marked 
transactional, and notes it does not have abort.



> Quad loading for in-memory assemblers
> -
>
> Key: JENA-1430
> URL: https://issues.apache.org/jira/browse/JENA-1430
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Reporter: A. Soroka
>Assignee: A. Soroka
>
> In-memory dataset Assemblers should support loading quad files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (JENA-1430) Quad loading for in-memory assemblers

2017-11-20 Thread A. Soroka (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259486#comment-16259486
 ] 

A. Soroka commented on JENA-1430:
-

That definitely makes sense. We want people to be able to assume (and work 
with) transactionality unless there's some reason not to.

> Quad loading for in-memory assemblers
> -
>
> Key: JENA-1430
> URL: https://issues.apache.org/jira/browse/JENA-1430
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Reporter: A. Soroka
>Assignee: A. Soroka
>
> In-memory dataset Assemblers should support loading quad files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] jena pull request #314: JENA-1430

2017-11-20 Thread ajs6f
GitHub user ajs6f opened a pull request:

https://github.com/apache/jena/pull/314

JENA-1430

Includes #313, plus:

- Extend testing to `DatasetAssembler`
- Ensure that `DatasetAssembler` can also load quads
- Correct `ja:DatasetTxnMem` => `ja:MemoryDataset`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajs6f/jena JENA-1430p

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #314


commit d174ec04dccb205de96e63c775e01f948380f8cc
Author: Andy Seaborne 
Date:   2017-11-20T10:57:01Z

JENA-1430: Read quads for ja:data by filename

commit 3e13dc64f4047eb589d9da46e50561a25290a230
Author: ajs6f 
Date:   2017-11-20T18:47:42Z

JENA-1430: Quad loading for in-memory assemblers




---


[jira] [Commented] (JENA-1430) Quad loading for in-memory assemblers

2017-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259653#comment-16259653
 ] 

ASF GitHub Bot commented on JENA-1430:
--

GitHub user ajs6f opened a pull request:

https://github.com/apache/jena/pull/314

JENA-1430

Includes #313, plus:

- Extend testing to `DatasetAssembler`
- Ensure that `DatasetAssembler` can also load quads
- Correct `ja:DatasetTxnMem` => `ja:MemoryDataset`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajs6f/jena JENA-1430p

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #314


commit d174ec04dccb205de96e63c775e01f948380f8cc
Author: Andy Seaborne 
Date:   2017-11-20T10:57:01Z

JENA-1430: Read quads for ja:data by filename

commit 3e13dc64f4047eb589d9da46e50561a25290a230
Author: ajs6f 
Date:   2017-11-20T18:47:42Z

JENA-1430: Quad loading for in-memory assemblers




> Quad loading for in-memory assemblers
> -
>
> Key: JENA-1430
> URL: https://issues.apache.org/jira/browse/JENA-1430
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Reporter: A. Soroka
>Assignee: A. Soroka
>
> In-memory dataset Assemblers should support loading quad files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] jena pull request #306: Algorithms for JENA-1414

2017-11-20 Thread ajs6f
Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/306#discussion_r152085996
  
--- Diff: jena-core/src/main/java/org/apache/jena/graph/GraphUtil.java ---
@@ -246,43 +282,214 @@ private static void deleteIteratorWorkerDirect(Graph 
graph, Iterator it)
 }
 }
 
-private static final int sliceSize = 1000 ;
-/** A safe and cautious remove() function that converts the remove to
- *  a number of {@link Graph#delete(Triple)} operations. 
+private static int MIN_SRC_SIZE   = 1000 ;
+// If source and destination are large, limit the search for the best 
way round to "deleteFrom" 
+private static int MAX_SRC_SIZE   = 1000*1000 ;
+private static int DST_SRC_RATIO  = 2 ;
+
+/**
+ * Delete triples in the destination (arg 1) as given in the source 
(arg 2).
+ *
+ * @implNote
+ *  This is designed for the case of {@code dstGraph} being comparable 
or much larger than
+ *  {@code srcGraph} or {@code srcGraph} having a lot of triples to 
actually be
+ *  deleted from {@code dstGraph}. This includes large, persistent 
{@code dstGraph}.
+ *
+ *  It is not designed for a large {@code srcGraph} and large {@code 
dstGraph} 
+ *  with only a few triples in common delete from {@code dstGraph}. It 
is better to
+ *  calculate the difference in someway, and copy into a small graph 
to use as the {@srcGraph}.  
--- End diff --

typo: some way


---


[jira] [Commented] (JENA-1414) Performance regression in Model.remove(Model m) method

2017-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259704#comment-16259704
 ] 

ASF GitHub Bot commented on JENA-1414:
--

Github user ajs6f commented on a diff in the pull request:

https://github.com/apache/jena/pull/306#discussion_r152085996
  
--- Diff: jena-core/src/main/java/org/apache/jena/graph/GraphUtil.java ---
@@ -246,43 +282,214 @@ private static void deleteIteratorWorkerDirect(Graph 
graph, Iterator it)
 }
 }
 
-private static final int sliceSize = 1000 ;
-/** A safe and cautious remove() function that converts the remove to
- *  a number of {@link Graph#delete(Triple)} operations. 
+private static int MIN_SRC_SIZE   = 1000 ;
+// If source and destination are large, limit the search for the best 
way round to "deleteFrom" 
+private static int MAX_SRC_SIZE   = 1000*1000 ;
+private static int DST_SRC_RATIO  = 2 ;
+
+/**
+ * Delete triples in the destination (arg 1) as given in the source 
(arg 2).
+ *
+ * @implNote
+ *  This is designed for the case of {@code dstGraph} being comparable 
or much larger than
+ *  {@code srcGraph} or {@code srcGraph} having a lot of triples to 
actually be
+ *  deleted from {@code dstGraph}. This includes large, persistent 
{@code dstGraph}.
+ *
+ *  It is not designed for a large {@code srcGraph} and large {@code 
dstGraph} 
+ *  with only a few triples in common delete from {@code dstGraph}. It 
is better to
+ *  calculate the difference in someway, and copy into a small graph 
to use as the {@srcGraph}.  
--- End diff --

typo: some way


> Performance regression in Model.remove(Model m) method
> --
>
> Key: JENA-1414
> URL: https://issues.apache.org/jira/browse/JENA-1414
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: Jena 3.3.0, Jena 3.4.0
>Reporter: Michał Woźniak
>Assignee: Andy Seaborne
> Attachments: graph_util_improve.patch
>
>
> The Model.remove(Model) works very slow on large models, as it propagates to 
> GraphUtil.deleteFrom(Graph, Graph), which computes size of the target graph 
> by iterating over all triples. This computation takes nearly 100% of the time 
> of the Model.remove(Model) operation.
> It seems this commit introduced the issue: 
> https://github.com/apache/jena/commit/781895ce64e062c7f2268a78189a777c39b92844#diff-fbb4d11dc804464f94c27e33e11b18e8
> Due to this bug deletion of a concept scheme on a large ontology may take 
> several minutes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] jena issue #306: Algorithms for JENA-1414

2017-11-20 Thread ajs6f
Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/306
  
Okay, now I get it. Agreed that number 3 is "trying too hard" and on the 
proposal to provide number 2 and document appropriate usage.


---


[jira] [Commented] (JENA-1414) Performance regression in Model.remove(Model m) method

2017-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259710#comment-16259710
 ] 

ASF GitHub Bot commented on JENA-1414:
--

Github user ajs6f commented on the issue:

https://github.com/apache/jena/pull/306
  
Okay, now I get it. Agreed that number 3 is "trying too hard" and on the 
proposal to provide number 2 and document appropriate usage.


> Performance regression in Model.remove(Model m) method
> --
>
> Key: JENA-1414
> URL: https://issues.apache.org/jira/browse/JENA-1414
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: Jena 3.3.0, Jena 3.4.0
>Reporter: Michał Woźniak
>Assignee: Andy Seaborne
> Attachments: graph_util_improve.patch
>
>
> The Model.remove(Model) works very slow on large models, as it propagates to 
> GraphUtil.deleteFrom(Graph, Graph), which computes size of the target graph 
> by iterating over all triples. This computation takes nearly 100% of the time 
> of the Model.remove(Model) operation.
> It seems this commit introduced the issue: 
> https://github.com/apache/jena/commit/781895ce64e062c7f2268a78189a777c39b92844#diff-fbb4d11dc804464f94c27e33e11b18e8
> Due to this bug deletion of a concept scheme on a large ontology may take 
> several minutes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)