Hi Andy
On 10 May 2011 21:09, Andy Seaborne <[email protected]> wrote:
>>>> Having LARQ is Fuseki is really valuable IMHO.
>>>> Using the new LARQ keeps Lucene indexes up-to-date with TDB and it
>>>> avoids
>>>> duplicates. This, IMHO, is another really valuable thing for LARQ users.
>>>
>>> Not disagreeing but there seems to be no migration plan or documentation.
>>
>> If we leave the old/legacy LARQ in ARQ as it is, no migration plan is
>> necessary.
>> People can decide which one to use.
>
> I thought that indexes built with ARQ do not work with LARQ ones due to the
> new field. At least, flipping between the two could cause issues?
I wrote: people can decide which one to use, not flip between the two.
Flipping between old/legacy LARQ in ARQ to new LARQ requires
re-indexing (larqbuild can be used for that).
However, leaving LARQ in ARQ allows people to chose when they want to migrate.
ARQ can be released, people can upgrade ARQ without reindexing (if
they are using the old LARQ).
> What describing about assembler file changes to connect the text index to
> the dataset?
As you suggested, changes to the assembler file are minimal:
<#ds1> rdf:type ja:RDFDataset ;
ja:textIndex "/path/to/lucene/index" ;
Should LARQ use ja namespace or a different one?
I'll provide a new paragraph describing this to be added to the LARQ
documentation.
>> Documentation: yes, necessary but there are no actual changes from an
>> API or
>> use point of view. For developers, I'll document how to add a dependency
>> on
>> LARQ artifacts if they want to use the new/separate module.
>>
>>> Why don't we just switch to LARQ now?
>>
>> I'd prefer to add LARQ to Fuseki first, so that we can make it easier for
>> people to try it out, they can report us feedback and if it's ok we can
>> then remove the old LARQ from ARQ.
>
> I'm not sure how much feed back that'll produce.
>
> Fuseki is distributed as a complete jar as well as via maven. As a complete
> jar, it won't have LARQ in it, would it? So used that way, the most common
> way, won't exercise LARQ?
Fuseki can have LARQ in it, one way to do is by adding a dependency to LARQ:
<dependency>
<groupId>org.apache.jena</groupId>
<artifactId>larq</artifactId>
<version>${ver.larq}</version>
</dependency>
Which has a transitive dependency on Lucene v3.1.0 and, therefore, we need to
exclude the Lucene dependency from arq and arq-test in Fuseki's pom.xml.
This is what I tried to do at r8810.
Currently, the distribution of Fuseki as a complete jar includes ARQ (which has
the old/legacy LARQ in it and Lucene v2.3.1). I don't see why it couldn't have
the new LARQ with Lucene v3.1.0 in it, instead. If we decide to do so.
Having the new LARQ with Lucene v3.1.0 in Fuseki would be my favourite choice.
If the new LARQ is not in Fuseki, I don't see how one could use it with Fuseki
without patching the pom.xml and repackaging it or creating a new project which
depends on Fuseki and on LARQ.
> And the assembler description needs to change to activeate the index so even
> if it's included, the user has to do something to get it's benefits.
Yes, users will need to add one line (i.e. ja:textIndex
"/path/to/lucene/index" )
to their assembler file, supposing they are using an assembler file.
If they are not using an assembler file, a command line option could be added
to point to a Lucene index directory (same as it is done to point to a TDB index
directory). This would probably be even simpler for Fuseki users.
>> Leaving LARQ in ARQ minimize risk and changes and it offers a fall back
>> to people.
>>
>>> If it's not automatic (Lucene3 can read Lucen2 indexes?), then we need
>>> to document how the user can do an upgrade.
>>
>> The new LARQ adds a new field to the Lucene index (i.e. "hash") to support
>> unindex/deletes, see [1], therefore a reindex (using the usual larqbuild
>> is
>> necessary).
>
> I think that needs explaining somewhere + the assembler changes, or people
> may not realise the benefits of LARQ.
Yes, I agree. I'll do.
Paolo
>
>> Paolo
>