[
https://issues.apache.org/jira/browse/JENA-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973762#action_12973762
]
Paolo Castagna edited comment on JENA-9 at 12/21/10 12:02 PM:
--------------------------------------------------------------
> Merge JENA-5 fix
I have merged the changes done by Andy to fix JEAN-5 into the separate LARQ
module.
I have also removed the @author annotations from comments, since I noticed Andy
did it in ARQ.
> Upgrade Lucene version to 2.9.3 and fix tests (if there are failures). Remove
> code using deprecated Lucene APIs and upgrade to Lucene 3.0.x.
Done, LARQ is now using Lucene 3.0.3. However, it is possible to move back to
Lucene 2.9.3 as a drop-in replacement (if someone needs/wants this).
> Decide how many results to return when the user does not specify it, 1000?
> More?
It's now a constant in LARQ.java, it's set to 1000.
> Should we use the index to suppress duplicates instead of in-memory data
> structures?
IndexBuilderLiteral.java is now using the index rather than an in-memory data
structure to avoid adding duplicate documents to the Lucene index:
if ( ! super.index.getIndex().hasMatch(LARQ.fLex + ":\"" +
node.getLiteralLexicalForm() + "\"" ))
{
if ( indexThisLiteral(s.getLiteral()))
index.index(node, node.getLiteralLexicalForm()) ;
}
> We could use the Model to decide when there are no more triples with a
> specified literal and therefore it's ok to remove it from Lucene.
Done, for example, look at IndexBuilderLiteral.java:
public void unindexStatement(Statement s)
{
if ( ! indexThisStatement(s) )
return ;
if ( s.getObject().isLiteral() )
{
// we use the Model as reference counting
StmtIterator iter = s.getModel().listStatements((Resource)null,
(Property)null, s.getObject());
if ( ! iter.hasNext() ) {
Node node = s.getObject().asNode() ;
if ( indexThisLiteral(s.getLiteral())) {
index.unindex(node, node.getLiteralLexicalForm()) ;
}
}
}
}
> See how the new NRT capabilities of Lucene can be used from LARQ.
See IndexBuilderBase.java:
protected IndexReader getIndexReader()
{
try {
flushWriter() ;
if ( indexWriter != null ) {
return indexWriter.getReader() ; // Let's use the Near Real
Time (NRT)
} else {
return IndexReader.open(dir, true) ;
}
} catch (Exception e) { throw new ARQLuceneException("getIndexReader",
e) ; }
}
> Review package names (currently c.h.h.j.sparql.larq and c.h.h.j.query.larq).
> Should we move to c.h.h.j.larq.*?
I think we should, but I have not done it yet.
Indeed, we could change to org.apache.jena.larq.*. What do you think?
was (Author: castagna):
> Merge JENA-5 fix
I have merged the changes done by Andy to fix JEAN-5 into the separate LARQ
module.
I have also removed the @author annotations from comments, since I noticed Andy
did it in ARQ.
> Upgrade Lucene version to 2.9.3 and fix tests (if there are failures). Remove
> code using deprecated Lucene APIs and upgrade to Lucene 3.0.x.
Done, LARQ is now using Lucene 3.0.3. However, it is possible to move back to
Lucene 2.9.3 as a drop-in replacement (if someone needs/wants this).
> Decide how many results to return when the user does not specify it, 1000?
> More?
It's now a constant in LARQ.java, it's set to 1000.
> Should we use the index to suppress duplicates instead of in-memory data
> structures?
IndexBuilderLiteral.java is now using the index rather than an in-memory data
structure to avoid adding duplicate documents to the Lucene index:
<pre>
{code}
if ( ! super.index.getIndex().hasMatch(LARQ.fLex + ":\"" +
node.getLiteralLexicalForm() + "\"" ))
{
if ( indexThisLiteral(s.getLiteral()))
index.index(node, node.getLiteralLexicalForm()) ;
}
{code}
</pre>
> We could use the Model to decide when there are no more triples with a
> specified literal and therefore it's ok to remove it from Lucene.
Done, for example, look at IndexBuilderLiteral.java:
{code}
public void unindexStatement(Statement s)
{
if ( ! indexThisStatement(s) )
return ;
if ( s.getObject().isLiteral() )
{
// we use the Model as reference counting
StmtIterator iter = s.getModel().listStatements((Resource)null,
(Property)null, s.getObject());
if ( ! iter.hasNext() ) {
Node node = s.getObject().asNode() ;
if ( indexThisLiteral(s.getLiteral())) {
index.unindex(node, node.getLiteralLexicalForm()) ;
}
}
}
}
{code}
> See how the new NRT capabilities of Lucene can be used from LARQ.
See IndexBuilderBase.java:
{code}
protected IndexReader getIndexReader()
{
try {
flushWriter() ;
if ( indexWriter != null ) {
return indexWriter.getReader() ; // Let's use the Near Real
Time (NRT)
} else {
return IndexReader.open(dir, true) ;
}
} catch (Exception e) { throw new ARQLuceneException("getIndexReader",
e) ; }
}
{code}
> Review package names (currently c.h.h.j.sparql.larq and c.h.h.j.query.larq).
> Should we move to c.h.h.j.larq.*?
I think we should, but I have not done it yet.
Indeed, we could change to org.apache.jena.larq.*. What do you think?
> LARQ as a separate module from ARQ
> ----------------------------------
>
> Key: JENA-9
> URL: https://issues.apache.org/jira/browse/JENA-9
> Project: Jena
> Issue Type: Task
> Components: LARQ
> Reporter: Paolo Castagna
> Assignee: Paolo Castagna
>
> LARQ can be extracted from ARQ as a separate module depending on ARQ.
> ARQ should not depend on LARQ (to avoid dependency cycles) and it could check
> if LARQ is available in the classpath and wire the property function in
> dynamically.
> LARQ can have a different release cycle from ARQ and people who do not need
> free text search will not need to include Lucene in their classpath.
> A separate (experimental) module is available here:
> https://jena.svn.sourceforge.net/svnroot/jena/LARQ/trunk/
> List of things to do/decide includes:
> - Merge JENA-5 fix
> - Upgrade Lucene version to 2.9.3 and fix tests (if there are failures).
> - Remove code using deprecated Lucene APIs and upgrade to Lucene 3.0.x.
> - Decide how many results to return when the user does not specify it, 1000?
> More?
> - Should we use the index to suppress duplicates instead of in-memory data
> structures?
> - How do we implement removals/unindex?
> - We could use the Model to decide when there are no more triples with a
> specified literal and therefore it's ok to remove it from Lucene.
> - See how the new NRT capabilities of Lucene can be used from LARQ.
> - Review package names (currently c.h.h.j.sparql.larq and
> c.h.h.j.query.larq). Should we move to c.h.h.j.larq.*?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.