Re: IndexFileNames

2005-06-07 Thread Bernhard Messer

Doug Cutting wrote:


[EMAIL PROTECTED] wrote:

--- 
lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java 
(original)
+++ 
lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java 
Mon Jun  6 10:52:12 2005

@@ -52,8 +52,8 @@
 if (name.endsWith(.+IndexReader.FILENAME_EXTENSIONS[i]))
   return true;
   }
-  if (name.equals(deletable)) return true;
-  else if (name.equals(segments)) return true;
+  if (name.equals(Constants.INDEX_DELETABLE_FILENAME)) return true;
+  else if (name.equals(Constants.INDEX_SEGMENTS_FILENAME)) 
return true;

   else if (name.matches(.+\\.f\\d+)) return true;
   return false;



This really belongs in the index package.  That way, when we change 
the set of files in an index, the changes will be localized.


So this, LuceneFileFilter, Constants.INDEX_* and 
IndexReader.FILENAME_EXTENSIONS, should all be moved to a common home 
in the index package, like org.apache.lucene.index.IndexFileNames.  
Thoughts?


yes, this makes sense. I will try to pack all the spreaded filenames and 
extensions together in a new class 
org.apache.lucene.index.IndexFileNames. So we have everything in one 
place and it will be much easier to maintain or even to change.


Bernhard


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexFileNames

2005-06-07 Thread Bernhard Messer

Bernhard Messer wrote:


Doug Cutting wrote:


[EMAIL PROTECTED] wrote:

--- 
lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java 
(original)
+++ 
lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java 
Mon Jun  6 10:52:12 2005

@@ -52,8 +52,8 @@
 if (name.endsWith(.+IndexReader.FILENAME_EXTENSIONS[i]))
   return true;
   }
-  if (name.equals(deletable)) return true;
-  else if (name.equals(segments)) return true;
+  if (name.equals(Constants.INDEX_DELETABLE_FILENAME)) return 
true;
+  else if (name.equals(Constants.INDEX_SEGMENTS_FILENAME)) 
return true;

   else if (name.matches(.+\\.f\\d+)) return true;
   return false;




This really belongs in the index package.  That way, when we change 
the set of files in an index, the changes will be localized.


So this, LuceneFileFilter, Constants.INDEX_* and 
IndexReader.FILENAME_EXTENSIONS, should all be moved to a common home 
in the index package, like org.apache.lucene.index.IndexFileNames.  
Thoughts?


sorry for the confusion. On the first look, i thought the new class 
IndexFileNames, containing the necessary constant values, fits perfect 
into org.apache.lucene.index. After a more detailed look, i get the 
feeling that it would be much better to place the new class into 
org.apache.store. If done, we can avoid all dependencies within 
FSDirectory to org.apache.lucene.index, which is very clean.


Why not creating a new public final class 
org.apache.lucene.store.IndexFileNames and move LuceneFileFilter, 
Constants.INDEX_*, SegmentMerger.COMPOUND_EXTENSIONS, 
SegmentMerger.VECTOR_EXTENSIONS and IndexReader.FILENAME_EXTENSIONS to it.


Does it sound ok ?

Bernhard




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexFileNames

2005-06-07 Thread Doug Cutting

Bernhard Messer wrote:
sorry for the confusion. On the first look, i thought the new class 
IndexFileNames, containing the necessary constant values, fits perfect 
into org.apache.lucene.index. After a more detailed look, i get the 
feeling that it would be much better to place the new class into 
org.apache.store. If done, we can avoid all dependencies within 
FSDirectory to org.apache.lucene.index, which is very clean.


I think that's an illusion: the store package would actually become more 
dependent on the index package.  If someone changes the set of files in 
an index then the changes will not be localized to the index package. 
Nothing outside of the index package should know anything about the 
internal structure of an index.


If insteaed the index package exposes a public API that permits other 
packages to inquire whether particular file names belong to an index 
then only a small dependency on what should be a stable API is exposed. 
 Changes to index structure can be made without changing anything 
outside of the index package.


Why not creating a new public final class 
org.apache.lucene.store.IndexFileNames and move LuceneFileFilter, 
Constants.INDEX_*, SegmentMerger.COMPOUND_EXTENSIONS, 
SegmentMerger.VECTOR_EXTENSIONS and IndexReader.FILENAME_EXTENSIONS to it.


I still think this class should be in the index package.  I'm not 
convinced that anything other than the FileFilter needs to be public.


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: svn-commit: 168449 FSDirectory

2005-06-07 Thread Chris Nokleberg
On Tue, 07 Jun 2005 09:13:10 +0200, Bernhard Messer wrote:
 exactly, that's what i had in mind. I know that we have to allocate a 
 new string object for every call, but this must be much cheaper than the 
 current implementation which has to walk thru the whole array every time 
 the method is called.

If you're still concerned about performance you can use a trie data
structure, which takes a little effort to build but can match very quickly
(with no allocations). I've attached an example implementation that works
for filename extensions which only use the letters 'a'-'z' (minimally
tested):

  private static final ExtensionMatcher MATCHER = 
new ExtensionMatcher(new String[]{
  cfs, fnm, fdx, fdt, tii,
  tis, frq, prx, del, tvx,
  tvd, tvf, tvp,
});

  MATCHER.match(foo.cfs); // returns true

BTW, I think it is a bad idea to have FILENAME_EXTENSIONS as a public
array. It being final does not prevent code from changing the contents of
the array. If the extensions must be public I would recommend either an
accessor that returns a copy of the array, or an unmodifiable
collection/set/list.

Chris



import java.util.*;

public class ExtensionMatcher
{
private Object[] tree;

public ExtensionMatcher(String[] exts)
{
tree = create(Arrays.asList(exts), 0);
}

private static Object[] create(List exts, int index)
{
Object[] array = new Object[27];
Map byLetter = new HashMap();
for (Iterator it = exts.iterator(); it.hasNext();) {
String ext = (String)it.next();
int length = ext.length();
if (length  index) {
Character c = new Character(ext.charAt(length - 1 - index));
List list = (List)byLetter.get(c);
if (list == null)
byLetter.put(c, list = new ArrayList());
list.add(ext);
} else if (length == index) {
array[26] = Boolean.TRUE;
}
}
for (Iterator it = byLetter.keySet().iterator(); it.hasNext();) {
Character c = (Character)it.next();
char val = c.charValue();
if (val  'a' || val  'z')
throw new IllegalArgumentException(Extension must be between 
'a' and 'z');
array[val - 'a'] = create((List)byLetter.get(c), index + 1);
}
return array;
}

public boolean match(String file)
{
int index = 0;
int length = file.length();
Object[] array = tree;
for (;;) {
if (index = length)
return false;
char val = file.charAt(length - 1 - index);
if (val == '.'  array[26] != null)
return true;
if (val  'a' || val  'z')
return false;
array = (Object[])array[val - 'a'];
if (array == null)
return false;
index++;
}
}
}



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



FAQ entry about deletions

2005-06-07 Thread Daniel Naber
Hi,

the FAQ contains this sentence:

Document that are deleted really are in deleted (???)

What is that supposed to mean? Could someone rephrase it?

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: svn commit: r180010 - in /lucene/java/trunk/src: java/org/apache/lucene/index/IndexReader.java java/org/apache/lucene/index/SegmentInfos.java test/org/apache/lucene/index/TestIndexReader.java

2005-06-07 Thread Daniel Naber
On Monday 06 June 2005 18:23, Bernhard Messer wrote:

 i would suggest to remove the deprecated flags from methods
 getCurrentVersion() and lastModified().

Okay, I just did that and rephrased the javadoc so that lastModified should 
not be used to check whether a reader is up-to-date.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]