[jira] Issue Comment Edited: (JENA-9) LARQ as a separate module from ARQ

Paolo Castagna (JIRA) Tue, 21 Dec 2010 09:14:51 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973762#action_12973762
 ]


Paolo Castagna edited comment on JENA-9 at 12/21/10 12:02 PM:
--------------------------------------------------------------

> Merge JENA-5 fix 

I have merged the changes done by Andy to fix JEAN-5 into the separate LARQ 
module.
I have also removed the @author annotations from comments, since I noticed Andy 
did it in ARQ.

> Upgrade Lucene version to 2.9.3 and fix tests (if there are failures). Remove 
> code using deprecated Lucene APIs and upgrade to Lucene 3.0.x. 

Done, LARQ is now using Lucene 3.0.3. However, it is possible to move back to 
Lucene 2.9.3 as a drop-in replacement (if someone needs/wants this).

> Decide how many results to return when the user does not specify it, 1000? 
> More? 

It's now a constant in LARQ.java, it's set to 1000.

> Should we use the index to suppress duplicates instead of in-memory data 
> structures? 

IndexBuilderLiteral.java is now using the index rather than an in-memory data 
structure to avoid adding duplicate documents to the Lucene index:

                if ( ! super.index.getIndex().hasMatch(LARQ.fLex + ":\"" + 
node.getLiteralLexicalForm() + "\"" ))
                {
                    if ( indexThisLiteral(s.getLiteral()))
                        index.index(node, node.getLiteralLexicalForm()) ;
                }

> We could use the Model to decide when there are no more triples with a 
> specified literal and therefore it's ok to remove it from Lucene. 

Done, for example, look at IndexBuilderLiteral.java:

    public void unindexStatement(Statement s)
    { 
        if ( ! indexThisStatement(s) )
            return ;

        if ( s.getObject().isLiteral() )
        {
                // we use the Model as reference counting
                StmtIterator iter = s.getModel().listStatements((Resource)null, 
(Property)null, s.getObject());
                if ( ! iter.hasNext() ) {
                Node node = s.getObject().asNode() ;
                if ( indexThisLiteral(s.getLiteral())) {
                        index.unindex(node, node.getLiteralLexicalForm()) ;
                }
                }
        }
    }

> See how the new NRT capabilities of Lucene can be used from LARQ. 

See IndexBuilderBase.java:

    protected IndexReader getIndexReader()
    {
        try {
            flushWriter() ;
            if ( indexWriter != null ) {
                return indexWriter.getReader() ; // Let's use the Near Real 
Time (NRT) 
            } else {
                return IndexReader.open(dir, true) ;
            }
        } catch (Exception e) { throw new ARQLuceneException("getIndexReader", 
e) ; }
    }

> Review package names (currently c.h.h.j.sparql.larq and c.h.h.j.query.larq). 
> Should we move to c.h.h.j.larq.*? 

I think we should, but I have not done it yet.

Indeed, we could change to org.apache.jena.larq.*. What do you think?

      was (Author: castagna):
    > Merge JENA-5 fix 

I have merged the changes done by Andy to fix JEAN-5 into the separate LARQ 
module.
I have also removed the @author annotations from comments, since I noticed Andy 
did it in ARQ.

> Upgrade Lucene version to 2.9.3 and fix tests (if there are failures). Remove 
> code using deprecated Lucene APIs and upgrade to Lucene 3.0.x. 

Done, LARQ is now using Lucene 3.0.3. However, it is possible to move back to 
Lucene 2.9.3 as a drop-in replacement (if someone needs/wants this).

> Decide how many results to return when the user does not specify it, 1000? 
> More? 

It's now a constant in LARQ.java, it's set to 1000.

> Should we use the index to suppress duplicates instead of in-memory data 
> structures? 

IndexBuilderLiteral.java is now using the index rather than an in-memory data 
structure to avoid adding duplicate documents to the Lucene index:

<pre>
{code}
                if ( ! super.index.getIndex().hasMatch(LARQ.fLex + ":\"" + 
node.getLiteralLexicalForm() + "\"" ))
                {
                    if ( indexThisLiteral(s.getLiteral()))
                        index.index(node, node.getLiteralLexicalForm()) ;
                }
{code}
</pre>

> We could use the Model to decide when there are no more triples with a 
> specified literal and therefore it's ok to remove it from Lucene. 

Done, for example, look at IndexBuilderLiteral.java:

{code}
    public void unindexStatement(Statement s)
    { 
        if ( ! indexThisStatement(s) )
            return ;

        if ( s.getObject().isLiteral() )
        {
                // we use the Model as reference counting
                StmtIterator iter = s.getModel().listStatements((Resource)null, 
(Property)null, s.getObject());
                if ( ! iter.hasNext() ) {
                Node node = s.getObject().asNode() ;
                if ( indexThisLiteral(s.getLiteral())) {
                        index.unindex(node, node.getLiteralLexicalForm()) ;
                }
                }
        }
    }
{code}

> See how the new NRT capabilities of Lucene can be used from LARQ. 

See IndexBuilderBase.java:

{code}
    protected IndexReader getIndexReader()
    {
        try {
            flushWriter() ;
            if ( indexWriter != null ) {
                return indexWriter.getReader() ; // Let's use the Near Real 
Time (NRT) 
            } else {
                return IndexReader.open(dir, true) ;
            }
        } catch (Exception e) { throw new ARQLuceneException("getIndexReader", 
e) ; }
    }
{code}

> Review package names (currently c.h.h.j.sparql.larq and c.h.h.j.query.larq). 
> Should we move to c.h.h.j.larq.*? 

I think we should, but I have not done it yet.

Indeed, we could change to org.apache.jena.larq.*. What do you think?
  
> LARQ as a separate module from ARQ
> ----------------------------------
>
>                 Key: JENA-9
>                 URL: https://issues.apache.org/jira/browse/JENA-9
>             Project: Jena
>          Issue Type: Task
>          Components: LARQ
>            Reporter: Paolo Castagna
>            Assignee: Paolo Castagna
>
> LARQ can be extracted from ARQ as a separate module depending on ARQ.
> ARQ should not depend on LARQ (to avoid dependency cycles) and it could check 
> if LARQ is available in the classpath and wire the property function in 
> dynamically.
> LARQ can have a different release cycle from ARQ and people who do not need 
> free text search will not need to include Lucene in their classpath.
> A separate (experimental) module is available here: 
> https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/
> List of things to do/decide includes:
>  - Merge JENA-5 fix 
>  - Upgrade Lucene version to 2.9.3 and fix tests (if there are failures).
>  - Remove code using deprecated Lucene APIs and upgrade to Lucene 3.0.x.
>  - Decide how many results to return when the user does not specify it, 1000? 
> More?
>  - Should we use the index to suppress duplicates instead of in-memory data 
> structures?
>  - How do we implement removals/unindex?
>     - We could use the Model to decide when there are no more triples with a 
> specified literal and therefore it's ok to remove it from Lucene.
>  - See how the new NRT capabilities of Lucene can be used from LARQ.
>  - Review package names (currently c.h.h.j.sparql.larq and 
> c.h.h.j.query.larq). Should we move to c.h.h.j.larq.*?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (JENA-9) LARQ as a separate module from ARQ

Reply via email to