[
http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12451768 ]
wolfgang hoschek commented on LUCENE-550:
-----------------------------------------
Ok. That means a basic test passes. For some more exhaustive tests, run all the
queries in
src/test/org/apache/lucene/index/memory/testqueries.txt
against matching files such as
String[] files = listFiles(new String[] {
"*.txt", //"*.html", "*.xml", "xdocs/*.xml",
"src/java/test/org/apache/lucene/queryParser/*.java",
"src/java/org/apache/lucene/index/memory/*.java",
});
See testMany() for details. Repeat for various analyzer, stopword toLowerCase
settings, such as
boolean toLowerCase = true;
// boolean toLowerCase = false;
// Set stopWords = null;
Set stopWords = StopFilter.makeStopSet(StopAnalyzer.ENGLISH_STOP_WORDS);
Analyzer[] analyzers = new Analyzer[] {
// new SimpleAnalyzer(),
// new StopAnalyzer(),
// new StandardAnalyzer(),
PatternAnalyzer.DEFAULT_ANALYZER,
// new WhitespaceAnalyzer(),
// new PatternAnalyzer(PatternAnalyzer.NON_WORD_PATTERN, false, null),
// new PatternAnalyzer(PatternAnalyzer.NON_WORD_PATTERN, true,
stopWords),
// new SnowballAnalyzer("English", StopAnalyzer.ENGLISH_STOP_WORDS),
};
> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: http://issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Store
> Affects Versions: 1.9
> Reporter: Karl Wettin
> Attachments: class_diagram.png, class_diagram.png,
> instanciated_20060527.tar, InstanciatedIndexTermEnum.java,
> lucene.1.9-karl1.jpg, lucene2-karl_20060722.tar.gz,
> lucene2-karl_20060723.tar.gz
>
>
> After fixing the bugs, it's now 4.5 -> 5 times the speed. This is true for
> both at index and query time. Sorry if I got your hopes up too much. There
> are still things to be done though. Might not have time to do anything with
> this until next month, so here is the code if anyone wants a peek.
> Not good enough for Jira yet, but if someone wants to fool around with it,
> here it is. The implementation passes a TermEnum -> TermDocs -> Fields ->
> TermVector comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and positions are stored ugly
> and has bugs.
> You might notice that norms are float[] and not byte[]. That is me who
> refactored it to see if it would do any good. Bit shifting don't take many
> ticks, so I might just revert that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]