Hello All,

I have been given the envious job of upgrading existing faceted taxonomy 
indexes from 3.6 to 5.3.

To make sure that I have everything in working order, I have written a little 
program to “smoke test” .  Facets retrieved in version 3 should be retrievable 
in version 5, or our upgrade has failed.

Unfortunately, I can’t seem to put together a quick program to validate my date 
once it is upgraded to version 5.  Can someone tell me where I have gone off 
the rails?



In this email, I include:

1. The 3.6.2 validation code … (establishes what should be seen after the 
upgrade runs)
1.1. mvn dependencies
1.2. source code
1.3. output
2. The lucene upgrade shell script
3. The 5.3.1 validation code (that doesn’t generates nulls and isn’t quiet 
right)
3.1. mvn dependencies
3.2. source code
4.  The url for the compressed tar file of the index data stored in drop box.

Here are the key maven dependencies that I used for the 3.6 source:
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>3.6.0</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-facet</artifactId>
    <version>3.6.2</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-highlighter</artifactId>
    <version>3.6.0</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queries</artifactId>
    <version>3.6.0</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queryparser</artifactId>
    <version>3.6.0</version>
</dependency>


Here is the code to retrieve facet data from the version 3.6 index (which does 
work against version 3.6 lucene):

public class FacetRunner {
    public static void main(final String[] args) throws Exception {
        File indexDirFile = new 
File("/Users/scott/projects/prototypes/lucene-3-and-5/lucene3/data/doc-index/lucene");
        Directory indexDir = new SimpleFSDirectory(indexDirFile);
        IndexReader indexReader = IndexReader.open(indexDir);
        Searcher searcher = new IndexSearcher(indexReader);

        File taxonomyIndexDirFile = new 
File("/Users/scott/projects/prototypes/lucene-3-and-5/lucene3/data/facets");
        Directory taxonomyIndexDir = new 
SimpleFSDirectory(taxonomyIndexDirFile);
        TaxonomyReader taxo = new DirectoryTaxonomyReader(taxonomyIndexDir);

        Term aTerm = new Term("$facets", "$fulltree$");//     new Term("text", 
"clarissa");
        Query q = new TermQuery(aTerm);
        TopScoreDocCollector tdc = TopScoreDocCollector.create(10,true);

        FacetSearchParams facetSearchParams = new FacetSearchParams();

                facetSearchParams.addFacetRequest(new CountFacetRequest(
                new CategoryPath("brs_recipient_domain"), 10));


        FacetsCollector facetsCollector = new 
FacetsCollector(facetSearchParams, indexReader, taxo);

        searcher.search(q, MultiCollector.wrap(tdc, facetsCollector));
        List<FacetResult> res = facetsCollector.getFacetResults();
        for (FacetResult facetResult:res) {
            System.out.println(facetResult.toString());
        }

    }
Output looks like:

Request: brs_recipient_domain nRes=10 nLbl=10
Num valid Descendants (up to specified depth): 486
        Facet Result Node with 10 sub result nodes.
        Name: brs_recipient_domain
        Value: 2896.0
        Residue: 1497.0

        Subresult #0
                Facet Result Node with 0 sub result nodes.
                Name: brs_recipient_domain/enron.com
                Value: 1979.0
                Residue: 0.0

        Subresult #1
                Facet Result Node with 0 sub result nodes.
                Name: brs_recipient_domain/aol.com
                Value: 124.0
                Residue: 0.0

        Subresult #2
                Facet Result Node with 0 sub result nodes.
                Name: brs_recipient_domain/bracepatt.com
                Value: 84.0
                Residue: 0.0

        Subresult #3
                Facet Result Node with 0 sub result nodes.
                Name: brs_recipient_domain/txu.com
                Value: 63.0
                Residue: 0.0

        Subresult #4
                Facet Result Node with 0 sub result nodes.
                Name: brs_recipient_domain/hotmail.com
                Value: 46.0
                Residue: 0.0

        Subresult #5
                Facet Result Node with 0 sub result nodes.
                Name: brs_recipient_domain/teneo-test.com
                Value: 42.0
                Residue: 0.0

        Subresult #6
                Facet Result Node with 0 sub result nodes.
                Name: brs_recipient_domain/yahoo.com
                Value: 41.0
                Residue: 0.0

        Subresult #7
                Facet Result Node with 0 sub result nodes.
                Name: brs_recipient_domain/dttus.com
                Value: 34.0
                Residue: 0.0

        Subresult #8
                Facet Result Node with 0 sub result nodes.
                Name: brs_recipient_domain/velaw.com
                Value: 30.0
                Residue: 0.0

        Subresult #9
                Facet Result Node with 0 sub result nodes.
                Name: brs_recipient_domain/netzero.net
                Value: 28.0
                Residue: 0.0


Process finished with exit code 0


To upgrade the indexes, I have written a shell script that runs the 
IndexUpgrader using the 4.10.4 core jar to bring the facet index to 4 and the 
document index to 4. 


#!/bin/sh

export JARS_HOME=/users/scott/projects/prototypes/lucene-3-and-5/jars

echo "===>>>>>migrating lucene data from 3 to 4<<<<<========="
echo
export LUCENE_4_PATH=$JARS_HOME/lucene-core-4.10.4.jar

date "+DATE: %Y-%m-%d%nTIME: %H:%M:%S"
echo "upgrading facets taxonomy indices from 3 to 4 with command time java -cp 
$LUCENE_4_PATH org.apache.lucene.index.IndexUpgrader facets"
time java -cp $LUCENE_4_PATH org.apache.lucene.index.IndexUpgrader facets
echo
echo "upgrading document  indices from 3 to 4 with command time java -cp 
$LUCENE_4_PATH org.apache.lucene.index.IndexUpgrader doc-index/lucene"
time java -cp $LUCENE_4_PATH org.apache.lucene.index.IndexUpgrader 
doc-index/lucene
echo
echo "===>>>>>migrating lucene data from 4 to 5<<<<<========="
echo
export 
LUCENE_5_PATH=$JARS_HOME/lucene-backward-codecs-5.3.1.jar:$JARS_HOME/lucene-core-5.3.1.jar

echo "upgrading facets taxonomy indices from 4 to 5 with command time java -cp 
$LUCENE_5_PATH org.apache.lucene.index.IndexUpgrader facets"
time java -cp $LUCENE_5_PATH org.apache.lucene.index.IndexUpgrader facets
echo
echo "upgrading document  indices from 4 to 5 with command time java -cp 
$LUCENE_5_PATH org.apache.lucene.index.IndexUpgrader doc-index/lucene"
time java -cp $LUCENE_5_PATH org.apache.lucene.index.IndexUpgrader 
doc-index/lucene
echo 
echo "done upgrading from lucene 3 to lucene 5"
date "+DATE: %Y-%m-%d%nTIME: %H:%M:%S"

no errors occur.

At this point, my index documents look like version 5 lucene.

Now I want to validate my indexes and pull similar (if not the same data) from 
the upgraded indexes.



Here are the maven dependencies for the 5.3.1. source


<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-facet</artifactId>
    <version>5.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>5.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-highlighter</artifactId>
    <version>5.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queries</artifactId>
    <version>5.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queryparser</artifactId>
    <version>5.3.1</version>
</dependency>


Here is my 5.3.1  program - it return’s nulls - what am I doing wrong?.



    public static void main(final String[] args) throws Exception {
        File indexDirFile = new 
File("/Users/scott/projects/prototypes/lucene-3-and-5/lucene5/data/doc-index/lucene");
        
        Path indexDirFilePath = indexDirFile.toPath(); 
        Directory indexDir = new SimpleFSDirectory(indexDirFilePath);
        IndexReader indexReader = DirectoryReader.open(indexDir);

        IndexSearcher searcher = new IndexSearcher(indexReader);

        File taxonomyIndexDirFile = new 
File("/Users/scott/projects/prototypes/lucene-3-and-5/lucene5/data/facets");
        Path taxonomyIndexDirFilePath = taxonomyIndexDirFile.toPath();
        Directory taxonomyIndexDir = new 
SimpleFSDirectory(taxonomyIndexDirFilePath);
        TaxonomyReader taxo = new DirectoryTaxonomyReader(taxonomyIndexDir);

        Term aTerm = new Term("$facets", "$fulltree$");
        Query q = new TermQuery(aTerm);


        FacetsCollector facetsCollector = new FacetsCollector();

        //searcher.search(q, MultiCollector.wrap(tdc, facetsCollector));
        //FacetsCollector.search(searcher, new 
MatchAllDocsQuery(),10,facetsCollector);
        FacetsCollector.search(searcher, q, 10, facetsCollector);

        FacetsConfig config = new FacetsConfig();
        //config.set
        Facets facets = new FastTaxonomyFacetCounts(taxo, config, 
facetsCollector);
        FacetResult result = facets.getTopChildren(10, "brs_recipient_domain");



        for (LabelAndValue labelValue : result.labelValues) {
            System.out.println(String.format("%s (%s)", labelValue.label, 
labelValue.value));
        }

    }
Here is the url to a gzipped tar that contains the index (not yet upgraded):  
https://www.dropbox.com/s/qbr7ogwgekatrdf/faceted_lucene_data.tar.gz?dl=0 
<https://www.dropbox.com/s/qbr7ogwgekatrdf/faceted_lucene_data.tar.gz?dl=0>

Thanks for your help.

SCott

Reply via email to