[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

wolfgang hoschek (JIRA) Tue, 21 Nov 2006 18:19:13 -0800

    [ 
http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12451768 ] 
            
wolfgang hoschek commented on LUCENE-550:
-----------------------------------------


Ok. That means a basic test passes. For some more exhaustive tests, run all the 
queries in 

src/test/org/apache/lucene/index/memory/testqueries.txt

against matching files such as 

    String[] files = listFiles(new String[] {
      "*.txt", //"*.html", "*.xml", "xdocs/*.xml", 
      "src/java/test/org/apache/lucene/queryParser/*.java",
      "src/java/org/apache/lucene/index/memory/*.java",
    });
 

See testMany() for details. Repeat for various analyzer, stopword toLowerCase 
settings, such as 

    boolean toLowerCase = true;
//    boolean toLowerCase = false;
//    Set stopWords = null;
    Set stopWords = StopFilter.makeStopSet(StopAnalyzer.ENGLISH_STOP_WORDS);
    
    Analyzer[] analyzers = new Analyzer[] { 
//        new SimpleAnalyzer(),
//        new StopAnalyzer(),
//        new StandardAnalyzer(),
        PatternAnalyzer.DEFAULT_ANALYZER,
//        new WhitespaceAnalyzer(),
//        new PatternAnalyzer(PatternAnalyzer.NON_WORD_PATTERN, false, null),
//        new PatternAnalyzer(PatternAnalyzer.NON_WORD_PATTERN, true, 
stopWords),        
//        new SnowballAnalyzer("English", StopAnalyzer.ENGLISH_STOP_WORDS),
    };
 


> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
>                 Key: LUCENE-550
>                 URL: http://issues.apache.org/jira/browse/LUCENE-550
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 1.9
>            Reporter: Karl Wettin
>         Attachments: class_diagram.png, class_diagram.png, 
> instanciated_20060527.tar, InstanciatedIndexTermEnum.java, 
> lucene.1.9-karl1.jpg, lucene2-karl_20060722.tar.gz, 
> lucene2-karl_20060723.tar.gz
>
>
> After fixing the bugs, it's now 4.5 -> 5 times the speed. This is true for 
> both at index and query time. Sorry if I got your hopes up too much. There 
> are still things to be done though. Might not have time to do anything with 
> this until next month, so here is the code if anyone wants a peek.
> Not good enough for Jira yet, but if someone wants to fool around with it, 
> here it is. The implementation passes a TermEnum -> TermDocs -> Fields -> 
> TermVector comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and positions are stored ugly 
> and has bugs.
> You might notice that norms are float[] and not byte[]. That is me who 
> refactored it to see if it would do any good. Bit shifting don't take many 
> ticks, so I might just revert that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

Reply via email to