[ http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12451768 ] wolfgang hoschek commented on LUCENE-550: -----------------------------------------
Ok. That means a basic test passes. For some more exhaustive tests, run all the queries in src/test/org/apache/lucene/index/memory/testqueries.txt against matching files such as String[] files = listFiles(new String[] { "*.txt", //"*.html", "*.xml", "xdocs/*.xml", "src/java/test/org/apache/lucene/queryParser/*.java", "src/java/org/apache/lucene/index/memory/*.java", }); See testMany() for details. Repeat for various analyzer, stopword toLowerCase settings, such as boolean toLowerCase = true; // boolean toLowerCase = false; // Set stopWords = null; Set stopWords = StopFilter.makeStopSet(StopAnalyzer.ENGLISH_STOP_WORDS); Analyzer[] analyzers = new Analyzer[] { // new SimpleAnalyzer(), // new StopAnalyzer(), // new StandardAnalyzer(), PatternAnalyzer.DEFAULT_ANALYZER, // new WhitespaceAnalyzer(), // new PatternAnalyzer(PatternAnalyzer.NON_WORD_PATTERN, false, null), // new PatternAnalyzer(PatternAnalyzer.NON_WORD_PATTERN, true, stopWords), // new SnowballAnalyzer("English", StopAnalyzer.ENGLISH_STOP_WORDS), }; > InstanciatedIndex - faster but memory consuming index > ----------------------------------------------------- > > Key: LUCENE-550 > URL: http://issues.apache.org/jira/browse/LUCENE-550 > Project: Lucene - Java > Issue Type: New Feature > Components: Store > Affects Versions: 1.9 > Reporter: Karl Wettin > Attachments: class_diagram.png, class_diagram.png, > instanciated_20060527.tar, InstanciatedIndexTermEnum.java, > lucene.1.9-karl1.jpg, lucene2-karl_20060722.tar.gz, > lucene2-karl_20060723.tar.gz > > > After fixing the bugs, it's now 4.5 -> 5 times the speed. This is true for > both at index and query time. Sorry if I got your hopes up too much. There > are still things to be done though. Might not have time to do anything with > this until next month, so here is the code if anyone wants a peek. > Not good enough for Jira yet, but if someone wants to fool around with it, > here it is. The implementation passes a TermEnum -> TermDocs -> Fields -> > TermVector comparation against the same data in a Directory. > When it comes to features, offsets don't exists and positions are stored ugly > and has bugs. > You might notice that norms are float[] and not byte[]. That is me who > refactored it to see if it would do any good. Bit shifting don't take many > ticks, so I might just revert that. > I belive the code is quite self explaining. > InstanciatedIndex ii = .. > ii.new InstanciatedIndexReader(); > ii.addDocument(s).. replace IndexWriter for now. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]