[ 
https://issues.apache.org/jira/browse/JENA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340105#comment-14340105
 ] 

ASF GitHub Bot commented on JENA-686:
-------------------------------------

GitHub user ehedgehog opened a pull request:

    https://github.com/apache/jena/pull/39

    Updated text indexing

    This change addresses JENA-686, support for cross field conjunctive queries 
in jena-text, by
    allowing TextDocProducers to be specified in a TextDatasetAssembler 
assembly and
    existing index entities to be updated as well as added. 
    
    DatasetTextGraph - commit's monitor finish() moved to the top
        of the method. This is because a batching TextDocProducer
        (as we have in our external app) may have quads buffered up
        awaiting end-of-batch and they must be flushed by finish();
        if they are not, they  are auto-flushed after the commit has
        been run and an exception is thrown.
    
    TextDatasetFactory - needed methods to create datasets with
        doc producers and close-index-on-close flags
    
    TextIndex - added new operation updateEntity to allow update
        of (possibly) existing entities
    
    TextIndexLucene - added implementation of updateEntity. Added
        deleteDocuments(Term) for deletion of documents (as is used
        in ppd-text-index text doc producer batch).
    
    TextDatasetAssembler - updated to allow specification of doc
        producer (since done by mainline Jena) and close-index-on
        close.
    
    AbstractTestDatasetWithLuceneIndex, DummyDocProducer,
    TestTextDatasetAssembler - fix up some issues in the test 
        framework (use of statics vs use of instance variables,
        remembering to close datasets @After done, etc).


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/epimorphics/jena-config-doc-producer 
updated-text-indexing

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/39.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #39
    
----
commit 1a919696739e60d46f92b1e58d0fbefb50c14dee
Author: Chris Dollin <[email protected]>
Date:   2015-02-19T15:44:20Z

    Integrated changes to jena-text
    from work on JENA-686.

commit 5c4e91981f3564252cd74d638285497d149229b2
Author: Chris Dollin <[email protected]>
Date:   2015-02-20T11:31:27Z

    Merged in the changes since the apache-jena
    fork (over 400 commits) and fix up the jena-text changes that get conflicts 
(mostly due to the
    automatic merge not handling all the cases well).

commit c702ba48388304dfaf88e84788aed59d3566d685
Author: Chris Dollin <[email protected]>
Date:   2015-02-26T09:40:33Z

    Fixed conflicts following merge with latest jena master.

commit d7040a9955a62eeef2e16d38eefb101420a07c06
Author: Chris Dollin <[email protected]>
Date:   2015-02-26T12:02:52Z

    Updated dataset assembler so that it can handle dociument
    producers that need a dataset as well as a text index, viz, the dependant 
text indexer.

----


> Add support for cross field conjunctive queries in jena-text
> ------------------------------------------------------------
>
>                 Key: JENA-686
>                 URL: https://issues.apache.org/jira/browse/JENA-686
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: Text
>    Affects Versions: Jena 2.11.1, Fuseki 1.0.1
>            Reporter: Brian McBride
>            Assignee: christopher james dollin
>         Attachments: TestDatasetWithBatchProducer.java
>
>
> We have a project where we are doing text search on addresses and wish to do 
> jena text queries like "city:liverpool AND street:green".  These queries 
> return no results, whilst queries like "street:green AND street:lane" work 
> fine.
> The reason is that jena text indexes each property in a separate Lucene 
> document, so there is no Lucene document matching city:liverpool AND 
> street:green, there are two documents, one for each property.
> Given the scale of our data, we really want to do the conjunctive query in 
> Lucene and not two separate queries and then a filter in SPARQL.
> I will attach a test case from an attempt to solve this for us to illustrate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to