[GitHub] jena pull request: jena-text updates for constructing documents su...

2015-03-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/42


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text updates for constructing documents su...

2015-03-13 Thread ehedgehog
GitHub user ehedgehog opened a pull request:

https://github.com/apache/jena/pull/42

jena-text updates for constructing documents suitable for conjunctive 
queries

This change to jena-text allows TextDocumentProducers
to access the dataset they are monitoring, and to
create indexes where there is a single (Lucene) document
for a given subject and all its defined properties
rather than a separate document per triple that the
subject appears in.

This allows indexes to be used for conjunctive query
eg a request such as

city: Plymouth AND street: Station

See https://github.com/epimorphics/ppd-text-index for an
an example project that uses conjunctive queries and 
provides a bulk index creation utility.

The changes to jena-text are spread over six files as follows:

# jena-text/src/main/java/org/apache/jena/query/text/DatasetGraphText.java 

The two-phase commit protocol is modified to ensure that a
DatasetChanges monitor finishes() before the commit protocol
starts. It is possible for a DatasetChanges to have buffered-up
changes which have not yet been applied; applying these
changes mid-commit can cause errors.

[This problem was detected in https://github.com/epimorphics/ppd-text-index
where TextDocProducerBatch does have buffered state and closing
the DatasetGraphText threw an exception when the state was flushed.]

A test for this behaviour is not currently available.

# jena-text/src/main/java/org/apache/jena/query/text/TextIndex.java 

Added the abstract method updateEntity, contracted with addEntity().
Updating an index with updateEntity is intended to discard any
existing Document with the entity key and create a new one with
all and only the fields specified by the given Entity, as opposed
to addEntity which creates a new Document from the Entity even if
one with that key already exists.

This allows Documents suitable for conjunctive query to
be created. The project 

# jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java 

Implement updateEntity for a TextIndexLucene.

# jena-text/src/main/java/org/apache/jena/query/text/TextIndexSolr.java 

Placeholder for implementation of TextIndexLucene.updateEntity; at the
moment we do not support it.

# 
jena-text/src/main/java/org/apache/jena/query/text/assembler/TextDatasetAssembler.java
 

The DatasetAssembler may construct a non-default TextDocProducer to
feed to the TextDatasetFactory, passing in to the constructor the
TextIndex that the TextDocProducer uses. This change additionally
allows for a two-argument constructor taking the DatasetGraph as
well as the TextIndex as arguments.

The TextDocProducer can use this to query the dataset for triples.
(EG other triples with the same subject as one it has received, so
as to build all the properties into a single Document.)

#  
jena-text/src/test/java/org/apache/jena/query/text/assembler/TestTextDatasetAssembler.java
 

Changes to have a test that the two-argument constructor is called
when appropriate.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/epimorphics/jena-config-doc-producer revised-A

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/42.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #42


commit 3ff763ac184dd49bdc3b6ceff9acb778aac29eae
Author: Chris Dollin ehog.he...@googlemail.com
Date:   2015-03-11T15:13:43Z

Compacted and simplified changes to jena-text
to support ppd-text-index.

commit a8002e8ce3452aad7c51f20600c9389ef71a6dd5
Author: Chris Dollin ehog.he...@googlemail.com
Date:   2015-03-12T15:25:27Z

Added test to check that the
assembler gives access to the two-argument constructor of a docProducer.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---