Re: The tvp extension

2007-01-09 Thread Bernhard Messer
Term Vectors with positions are written to the tvf file like other 
term vector information too. There is no extra file containing term 
vectors position information. The tvp extension seems to be an relict 
from earlier days where lucene file extensions where spreaded over 
several class files, e.g SegmentReader.java.


I will remove the tvp extension so that nobody gets confused. Thanks 
to Nicolas, reporting the bug.


Bernhard


I don't have the sources handy to check, but my guess would be this extension 
is/was for Term Vectors with Positions.
Like somebody else said recently, it would be good to make all these into static finals, 
so we don't have to chase string values in the code.

Otis

- Original Message 
From: Nicolas Lalevée [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Monday, January 8, 2007 12:10:32 AM
Subject: The tvp extension


Hello,

In o.a.l.index.IndexFileNames.java, there are these lines :

  static final String INDEX_EXTENSIONS[] = new String[] {
  cfs, fnm, fdx, fdt, tii, tis, frq, prx, del,
  tvx, tvd, tvf, tvp, gen, nrm 
  };


What is the tvp extension ? I didn't see any occurrence of it in the doc, 
neither in the code. A bug ?


cheers,
Nicolas

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-758) IndexReader.isCurrent fails when using two IndexReaders

2006-12-22 Thread Bernhard Messer (JIRA)
IndexReader.isCurrent fails when using two IndexReaders
---

 Key: LUCENE-758
 URL: http://issues.apache.org/jira/browse/LUCENE-758
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.0.1
 Environment: Operating System: other
Platform: other
Reporter: Bernhard Messer
Priority: Minor


there is a problem in IndexReader.isCurrent() if using two reader instances 
where one of them is based on a RAMDirectory. If there is an index and we open 
two IndexReaders where one is based on a FSDirectory and the other is based on 
a RAMDirectory, the IndexReader using the RAMDirectory does not recognize when 
the underlaying index has changed. The method IndexReader.isCurrent() always 
returns true. The testcase below shows the problem.

I did not find an ideal solution to solve the problem. I think the best way 
would be to change the IndexReader.isCurrent() implementation from:
  public boolean isCurrent() throws IOException {
return SegmentInfos.readCurrentVersion(directory) == 
segmentInfos.getVersion();
  }
to something like:
  public boolean isCurrent() throws IOException {
return directory.readCurrentVersion() == segmentInfos.getVersion();
  }
As far as i can see this would work for FS- and RAMDirectory. But then the 
implementing Directory classes have to know about segment files and there 
formating details.
What do others think ?

  /** 
   * additional testcase for IndexReaderTest to show the problem when using two 
different Readers
   */
  public void testIsCurrentWithCombined() throws Exception {
  String tempDir = System.getProperty(tempDir);
  if (tempDir == null)
  throw new IOException(tempDir undefined, cannot run test);

  File indexDir = new File(tempDir, lucenetestiscurrent);
  Directory fsStore = FSDirectory.getDirectory(indexDir, true);
  
  IndexWriter writer = new IndexWriter(fsStore, new SimpleAnalyzer(), true);
  addDocumentWithFields(writer);
  writer.close();
  
  IndexReader reader1 = IndexReader.open(fsStore);
  IndexReader reader2 = IndexReader.open(new RAMDirectory(fsStore));
  
  assertTrue(reader1.isCurrent());
  assertTrue(reader2.isCurrent());
  
  reader1.deleteDocument(0);
  reader1.close();
  
  // BUG: reader based on the RAMDirectory does not recognize the index 
change.
  assertFalse(reader2.isCurrent());
  
  reader2.close();
}



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-18 Thread Bernhard Messer

Otis,

i figured out a similar problem when running a very heavy loaded search 
application in a servlet container. The reasone using ThreadLocals was 
to get rid of synchronized method calls e.g in TermVectorsReader which 
would break down the overall search performance. Currently i do not see 
an easy solution to fix both, the synchronization and ThreadLocal problem.


Bernhard

Otis Gospodnetic wrote:

Moving to java-dev, I think this belongs here.
I've been looking at this problem some more today and reading about 
ThreadLocals.  It's easy to misuse them and end up with memory leaks, 
apparently... and I think we may have this problem here.

The problem here is that ThreadLocals are tied to Threads, and I think the 
assumption in TermInfosReader and SegmentReader is that (search) Threads are 
short-lived: they come in, scan the index, do the search, return and die.  In 
this scenario, their ThreadLocals go to heaven with them, too, and memory is 
freed up.

But when Threads are long-lived, as they are in thread pools (e.g. those in 
servlet containers), those ThreadLocals stay alive even after a single search 
request is done.  Moreover, the Thread is reused, and the new TermInfosReader 
and SegmentReader put some new values in that ThreadLocal on top of the old 
values (I think) from the previous search request.  Because the Thread still 
has references to ThreadLocals and the values in them, the values never get 
GCed.

I tried making ThreadLocals in TIR and SR static, I tried wrapping values saved 
in TLs in WeakReference, I've tried using WeakHashMap like in Robert Engel's 
FixedThreadLocal class from LUCENE-436, but nothing helped.  I thought about 
adding a public static method to TIR and SR, so one could call it at the end of 
a search request (think servlet filter) and clear the TL for the current 
thread, but that would require making TIR and SR public and I'm not 100% sure 
if it would work, plus that exposes the implementation details too much.
I don't have a solution yet.
But do we *really* need ThreadLocal in TIR and SR?  The only thing that TL is 
doing there is acting as a per-thread storage of some cloned value (in TIR we 
clone SegmentTermEnum and in SR we clone TermVectorsReader).  Why can't we just 
store those cloned values in instance variables?  Isn't whoever is calling TIR 
and SR going to be calling the same instance of TIR and SR anyway, and thus get 
access to those cloned values?

I'm really amazed that we haven't heard any reports about this before.  I am 
not sure why my application started showing this leak only about 3 weeks ago.  
It is getting more pounded on than before, so maybe that made the leak more 
obvious.  My guess is that more common Lucene usage is with a single index or a 
small number of them, and with short-lived threads, where this problem isn't 
easily visible.  In my case I deal with a few tens of thousands of indices and 
several parallel search threads that live forever in the thread pool.

Any thoughts about this or possible suggestions for a fix?
Thanks,
Otis



- Original Message 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Friday, December 15, 2006 12:28:29 PM
Subject: Leaking org.apache.lucene.index.* objects

Hi,

About 2-3 weeks ago I emailed about a memory leak in my application.  I then found some 
problems in my code (I wasn't closing IndexSearchers explicitly) and took care of those.  
Now I see my app is still leaking memory - jconsole clearly shows the Tenured 
Gen memory pool getting filled up until I hit the OOM, but I can't seem to 
pin-point the source.

I found that a bunch or o.a.l.index.* objects are not getting GCed, even though 
they should.  For example:

$ jmap -histo:live 7825 | grep apache.lucene.index | head -20 | sort -k2 -nr
num   #instances#bytes  class name
--
  4:   176484098831040  
org.apache.lucene.index.CompoundFileReader$CSIndexInput
  5:   211921567814880  org.apache.lucene.index.TermInfo
  7:   111245935598688  org.apache.lucene.index.SegmentReader$Norm
  9:   213231134116976  org.apache.lucene.index.Term
 12:   111789726829528  org.apache.lucene.index.FieldInfo
 13:22534018027200  org.apache.lucene.index.SegmentTermEnum
 15:58972714153448  org.apache.lucene.index.TermBuffer
 21: 86033 8718504  [Lorg.apache.lucene.index.TermInfo;
 20: 86033 8718504  [Lorg.apache.lucene.index.Term;
 23: 86120 7578560  org.apache.lucene.index.SegmentReader
 26: 90501 5068056  org.apache.lucene.store.FSIndexInput
 27: 86120 4822720  org.apache.lucene.index.TermInfosReader
 33: 86130 3445200  org.apache.lucene.index.SegmentInfo
 36: 87355 2795360  org.apache.lucene.store.FSIndexInput$Descriptor
 38: 86120 2755840  org.apache.lucene.index.FieldsReader
 39: 86050 2753600  org.apache.lucene.index.CompoundFileReader
 42: 46903 2251344  

Re: FuzzyQuery / PriorityQueue BUG

2006-03-01 Thread Bernhard Messer

Jörg,

could you please add this to JIRA so that things don't get lost. If you 
have a patch and/or a testcase showing the problem, it would be great if 
you append it to JIRA also.


thanks,
Bernhard

Jörg Henß wrote:


Hi,
FuzzyQuery produces a java.lang.NegativeArraySizeException in
PriorityQueue.initialize if I use Integer.MAX_VALUE as
BooleanQuery.MaxClauseCount. This is because it adds 1 to MaxClauseCount and
tries to allocate an Array of this Size (I think it overflows to MIN_VALUE).
Usually nobody needs so much clauses, but I think this should be catched
somehow. Perhaps an Error your MaxClauseCount is too large could do it, so
the user knows where to find the problem.
Greets
Joerg


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-475) RAMDirectory(Directory dir, boolean closeDir) constructor uses memory inefficiently.

2005-12-01 Thread Bernhard Messer (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-475?page=comments#action_12359083 ] 

Bernhard Messer commented on LUCENE-475:


I like the patch and find it very helpful if one tries to load larger indices 
into RAMDirectory.

Hoss Man,

why do you would like to have a new constructor to adjust the internal buffer 
size. I do not see any reason to make the buffersize configurable from outside. 
The tests i made with different sizes didn't show any difference on performace 
or disk usage. The new implementation would be similar to BufferedIndexOutput 
where the internal buffer size couldn't be changed either. Do i miss something ?


  RAMDirectory(Directory dir, boolean closeDir)  constructor uses memory 
 inefficiently.
 --

  Key: LUCENE-475
  URL: http://issues.apache.org/jira/browse/LUCENE-475
  Project: Lucene - Java
 Type: Improvement
   Components: Store
 Reporter: Volodymyr Bychkoviak
  Attachments: RamDirectory.diff

 recently I found that  RAMDirectory(Directory dir, boolean closeDir)  
 constructor uses memory inefficiently.
 files from source index are read entirely intro memory as single byte array 
 which is after all is thrown away. And if I want to load my 200M optimized, 
 compound format index to memory for faster search I should give JVM at least 
 400Mb memory limit. For larger indexes this can be an issue.
 I've attached patch how to solve this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



class org.apache.lucene.index.TermInfosTest

2005-12-01 Thread Bernhard Messer
just found a class org.apache.lucene.index.TermInfosTest within the 
src/test directory. It seems to be a relict from previous days. It 
doesn't run without a resource file words.txt and is no JUnit-Test. I 
would like to delete if there is nobody raising it's hand to stop me.


regards
Bernhard


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: svn commit: r332431 - in /lucene/java/trunk: CHANGES.txt src/java/org/apache/lucene/search/FieldDocSortedHitQueue.java src/test/org/apache/lucene/search/TestCustomSearcherSort.java

2005-11-15 Thread Bernhard Messer

Yonik,

TestCustomSearcherSort.java you added a few days ago shows that the 
author is Martin Seitz from T-Systems and doesn't has the apache license 
agreement in it's header. Is it ok to have this test in SVN ?


Bernhard


[EMAIL PROTECTED] wrote:


Author: yonik
Date: Thu Nov 10 19:13:10 2005
New Revision: 332431

URL: http://svn.apache.org/viewcvs?rev=332431view=rev
Log:
break sorting ties by index order: LUCENE-456

Added:
   
lucene/java/trunk/src/test/org/apache/lucene/search/TestCustomSearcherSort.java
Modified:
   lucene/java/trunk/CHANGES.txt
   
lucene/java/trunk/src/java/org/apache/lucene/search/FieldDocSortedHitQueue.java

Modified: lucene/java/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewcvs/lucene/java/trunk/CHANGES.txt?rev=332431r1=332430r2=332431view=diff
==
--- lucene/java/trunk/CHANGES.txt (original)
+++ lucene/java/trunk/CHANGES.txt Thu Nov 10 19:13:10 2005
@@ -245,6 +245,10 @@
change the sort order when sorting by string for documents without
a value for the sort field.
(Luc Vanlerberghe via Yonik, LUCENE-453)
+
+16. Fixed a sorting problem with MultiSearchers that can lead to
+missing or duplicate docs due to equal docs sorting in an arbitrary order.
+(Yonik Seeley, LUCENE-456)

Optimizations
 


Modified: 
lucene/java/trunk/src/java/org/apache/lucene/search/FieldDocSortedHitQueue.java
URL: 
http://svn.apache.org/viewcvs/lucene/java/trunk/src/java/org/apache/lucene/search/FieldDocSortedHitQueue.java?rev=332431r1=332430r2=332431view=diff
==
--- 
lucene/java/trunk/src/java/org/apache/lucene/search/FieldDocSortedHitQueue.java 
(original)
+++ 
lucene/java/trunk/src/java/org/apache/lucene/search/FieldDocSortedHitQueue.java 
Thu Nov 10 19:13:10 2005
@@ -157,6 +157,11 @@
c = -c;
}
}
-   return c  0;
+
+// avoid random sort order that could lead to duplicates (bug #31241):
+if (c == 0)
+  return docA.doc  docB.doc;
+
+return c  0;
}
}

Added: 
lucene/java/trunk/src/test/org/apache/lucene/search/TestCustomSearcherSort.java
URL: 
http://svn.apache.org/viewcvs/lucene/java/trunk/src/test/org/apache/lucene/search/TestCustomSearcherSort.java?rev=332431view=auto
==
--- 
lucene/java/trunk/src/test/org/apache/lucene/search/TestCustomSearcherSort.java 
(added)
+++ 
lucene/java/trunk/src/test/org/apache/lucene/search/TestCustomSearcherSort.java 
Thu Nov 10 19:13:10 2005
@@ -0,0 +1,268 @@
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.Calendar;
+import java.util.GregorianCalendar;
+import java.util.Map;
+import java.util.Random;
+import java.util.TreeMap;
+
+import junit.framework.Test;
+import junit.framework.TestCase;
+import junit.framework.TestSuite;
+import junit.textui.TestRunner;
+
+import org.apache.lucene.analysis.standard.StandardAnalyzer;
+import org.apache.lucene.document.DateTools;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.RAMDirectory;
+
+/**
+ * Unit test for sorting code.
+ *
+ * @author  Martin Seitz (T-Systems)
+ */
+
+public class TestCustomSearcherSort
+extends TestCase
+implements Serializable {
+
+private Directory index = null;
+private Query query = null;
+// reduced from 2 to 2000 to speed up test...
+private final static int INDEX_SIZE = 2000;
+
+   public TestCustomSearcherSort (String name) {
+   super (name);
+   }
+
+   public static void main (String[] argv) {
+   TestRunner.run (suite());
+   }
+
+   public static Test suite() {
+   return new TestSuite (TestCustomSearcherSort.class);
+   }
+
+
+   // create an index for testing
+   private Directory getIndex()
+   throws IOException {
+   RAMDirectory indexStore = new RAMDirectory ();
+   IndexWriter writer = new IndexWriter (indexStore, new 
StandardAnalyzer(), true);
+   RandomGen random = new RandomGen();
+   for (int i=0; iINDEX_SIZE; ++i) { // don't decrease; if to low 
the problem doesn't show up
+   Document doc = new Document();
+   if((i%5)!=0) { // some documents must not have an entry in 
the first sort field
+   doc.add (new Field(publicationDate_, 
random.getLuceneDate(), Field.Store.YES, Field.Index.UN_TOKENIZED));
+   }
+	if((i%7)==0) { // some documents to match the query (see below) 
+	doc.add (new 

[jira] Commented: (LUCENE-455) FieldsReader does not regard offset and position flags

2005-10-19 Thread Bernhard Messer (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-455?page=comments#action_12332492 ] 

Bernhard Messer commented on LUCENE-455:


Frank,

thanks for the patch. I've reviewed it and commited it.

Bernhard

 FieldsReader does not regard offset and position flags
 --

  Key: LUCENE-455
  URL: http://issues.apache.org/jira/browse/LUCENE-455
  Project: Lucene - Java
 Type: Bug
   Components: Index
 Versions: 1.9
 Reporter: Frank Steinmann
 Priority: Minor
  Attachments: FieldsReader.java

 When creating a Field the FieldsReader looks at the storeTermVector flag of 
 the FieldInfo. If true Field.TermVector.YES is used as parameter. But it 
 should be checked if storeOffsetWithTermVector and 
 storePositionWithTermVector are set and Field.TermVector.WITH_OFFSETS, 
 ...WITH_POSITIONS, or ...WITH_POSITIONS_OFFSETS should be used as appropriate.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Exception in full text search

2005-06-09 Thread Bernhard Messer

hi,

luke is an OpenSource utility which allows you to analyze and modify 
lucene's index internals. It can be downloaded from 
http://www.getopt.org/luke/


Bernhard


avrootshell wrote:


Hello,

   I'm able to create index file for full text search.And i'm sure it 
has the required entries as i have traced the traversal path through 
the tables i have specified. And also documents are added to the index 
file.


But when i specify some string to search,it throws an exception like 
this.



.E
Time: 0.234
There was 1 error:
1) 
testSrch(com.board.fts.FtsSearchCmdTest)java.lang.NullPointerException: 
null values not allowed
at 
org.apache.commons.collections.map.ReferenceMap.put(ReferenceMap.java:571) 


at com.sandra.servicer.txtsrch.SrchMan.search(SrchMan.java:108)
at com.board.fts.FtsSearchCmd.execute(FtsSearchCmd.java)
at com.board.fts.FtsSearchCmdTest.testSrch(FtsSearchCmdTest.java)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at com.board.fts.FtsSearchCmdTest.main(FtsSearchCmdTest.java)

FAILURES!!!
Tests run: 1,  Failures: 0,  Errors: 1


Is there any way to view the contents of index file which has been 
created?
If anyone comes up with some suggesions for this kind of error,I 
appreciate.


TIA,


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexFileNames

2005-06-07 Thread Bernhard Messer

Doug Cutting wrote:


[EMAIL PROTECTED] wrote:

--- 
lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java 
(original)
+++ 
lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java 
Mon Jun  6 10:52:12 2005

@@ -52,8 +52,8 @@
 if (name.endsWith(.+IndexReader.FILENAME_EXTENSIONS[i]))
   return true;
   }
-  if (name.equals(deletable)) return true;
-  else if (name.equals(segments)) return true;
+  if (name.equals(Constants.INDEX_DELETABLE_FILENAME)) return true;
+  else if (name.equals(Constants.INDEX_SEGMENTS_FILENAME)) 
return true;

   else if (name.matches(.+\\.f\\d+)) return true;
   return false;



This really belongs in the index package.  That way, when we change 
the set of files in an index, the changes will be localized.


So this, LuceneFileFilter, Constants.INDEX_* and 
IndexReader.FILENAME_EXTENSIONS, should all be moved to a common home 
in the index package, like org.apache.lucene.index.IndexFileNames.  
Thoughts?


yes, this makes sense. I will try to pack all the spreaded filenames and 
extensions together in a new class 
org.apache.lucene.index.IndexFileNames. So we have everything in one 
place and it will be much easier to maintain or even to change.


Bernhard


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexFileNames

2005-06-07 Thread Bernhard Messer

Bernhard Messer wrote:


Doug Cutting wrote:


[EMAIL PROTECTED] wrote:

--- 
lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java 
(original)
+++ 
lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java 
Mon Jun  6 10:52:12 2005

@@ -52,8 +52,8 @@
 if (name.endsWith(.+IndexReader.FILENAME_EXTENSIONS[i]))
   return true;
   }
-  if (name.equals(deletable)) return true;
-  else if (name.equals(segments)) return true;
+  if (name.equals(Constants.INDEX_DELETABLE_FILENAME)) return 
true;
+  else if (name.equals(Constants.INDEX_SEGMENTS_FILENAME)) 
return true;

   else if (name.matches(.+\\.f\\d+)) return true;
   return false;




This really belongs in the index package.  That way, when we change 
the set of files in an index, the changes will be localized.


So this, LuceneFileFilter, Constants.INDEX_* and 
IndexReader.FILENAME_EXTENSIONS, should all be moved to a common home 
in the index package, like org.apache.lucene.index.IndexFileNames.  
Thoughts?


sorry for the confusion. On the first look, i thought the new class 
IndexFileNames, containing the necessary constant values, fits perfect 
into org.apache.lucene.index. After a more detailed look, i get the 
feeling that it would be much better to place the new class into 
org.apache.store. If done, we can avoid all dependencies within 
FSDirectory to org.apache.lucene.index, which is very clean.


Why not creating a new public final class 
org.apache.lucene.store.IndexFileNames and move LuceneFileFilter, 
Constants.INDEX_*, SegmentMerger.COMPOUND_EXTENSIONS, 
SegmentMerger.VECTOR_EXTENSIONS and IndexReader.FILENAME_EXTENSIONS to it.


Does it sound ok ?

Bernhard




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: svn-commit: 168449 FSDirectory

2005-06-06 Thread Bernhard Messer

Hi Daniel,

i just had a look at the new implementation that FSDirectory deletes 
lucene related files only. I like the patch, but i think we left some 
room for optimization. In the current implementation, it's necessary to 
run thru all known Lucene extensions (13 for the moment), for each call 
of LuceneFileFilter.accept(). If creating an index in a directory 
which contains several hundred files, this definitly will be a 
bottleneck. So creating a new Index in a directory containing 100 files, 
we will endup with 1300 calls to
if (name.endsWith(.+IndexReader.FILENAME_EXTENSIONS[i])) which 
always needs to create a new StringBuffer to merge the two strings.


Therefore i would like to propose two changes:
1) we should store the extension in a hash and not in String[] to have a 
faster lookup

2) check for the file extension only without using the .

any thoughts

Bernhard


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: DO NOT REPLY [Bug 32965] - [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2005-04-04 Thread Bernhard Messer
Erik Hatcher wrote:
Oh, and one other thing Paul's code relies on JDK 1.4's assert  
keyword.  It seems this is an unnecessary reason to jump to 1.4  
dependence.

What do folks think about JDK 1.4 as a minimum Lucene requirement?
I'm not a fan of outdated software or historical systems. So i think the 
best would be to keep lucene still backward compatible with version 1.9 
and perform the switch to JDK 1.4 with lucene 2.0.

Bernhard
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: subclasses of abstract Query class are not implementing all methods

2005-03-11 Thread Bernhard Messer
David Spencer wrote:
Bernhard Messer wrote:
Hi,
I would like to cache query objects in a hash map. My implementation 
failed, because not all of the Query classes are implementing the 
necessary method: public int hashCode(). The same counts for the 
public boolean equals (Object o), public String toString(String 
fieldName) and public String toString(). To force all subclasses 
of Query to implement this 4 methods, i would like to make them 
abstract within the base class and implement the missing ones in the 
subclasses. So in Query class itself, it would look:

public abstract String toString(String field);
public abstract String toString();

What's the point of toString() w/o an argument - this doesn't really 
matter for Query does it?
Having a clean implementation of toString(), one would be able to 
reparse a Query with QueryParser. This is something which was discussed 
several times on the dev and user lists. I expect that it will work for 
all queries supported by QueryParser.


public abstract int hashCode();
public abstract boolean equals(Object o);
I think this would make the API cleaner and more usable.
Thoughts ???
Bernhard
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]